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MOLECULAR TYPING OF GROUP B STREPTOCOCCI 



Field of the invention 

The present invention relates to molecular methods of typing group B. 
5 streptococci, as well as polynucleotides useful in such methods. 

Background to the invention 

Group B streptococcus (GBS) - Streptococcus agalactiae - is the 
commonest cause of neonatal and obstetric sepsis and an increasingly 

10 important cause of septicaemia in the elderly and immunocompromised 

patients. The incidence of neonatal GBS sepsis has been reduced in recent years 
by the use of intrapartum antibiotic prophylaxis, but there are many problems 
with this approach. In future, vaccination is likely to be preferred and there has 
been considerable progress in development of conjugate polysaccharide GBS 

15 vaccines. 

Before the introduction of conjugate vaccines, extensive epidemiological 
and other related studies will be required to assess, not only the burden of 
disease, but also the distribution of GBS types (including capsular 
polysaccharide gene serotypes, serosubtypes; protein antigen gene subtypes; 

20 mobile genetic element subtypes) to determine the optimal formulation of 

vaccine antigens. Type distribution based on one geographic location or small 
numbers of patients may not be generally applicable. Continued monitoring 
will be necessary to assess the suitability of combinations of GBS vaccine 
antigens for different target populations in different geographic locations. 

25 Nine capsular polysaccharide GBS serotypes have been described 

(Harrison et al., 1998; Hickman et al., 1999). Various serotyping methods have 
been used, including immuno-precipitation (Wilkinson and Moody, 1969), 
enzyme immunoassay (Holm and Hakansson, 1988), coagglutination 
(Hakansson et al., 1992), counter-immunoelectrophoresis, and capillary 

30 precipitation (Triscott and Davies, 1979), latex agglutination (Zuerlein et al., 
1991), fluorescence microscopy (Cropp et al., 1974) and inhibition-ELISA 
(Arakere et al., 1999). These methods are labour-intensive and require high- 
titered serotype-specific antisera, which are expensive and difficult to make and 
commercially available for only six serotypes - la to V (Arakere et al., 1999). 

35 Molecular genotyping methods, such as pulsed-field gel electrophoresis 

(Rolland et al., 1999), restriction endonuclease analysis (Nagano et al., 1991) are 
useful for epidemiological studies but do not generally identify serotypes. 
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Consequently, there is a need for a reliable molecular method for GBS serotype 
identification. 

Summary of the invention 

5 We have identified specific regions within the genome of group B 

streptococci of inter-type sequence heterogeneity that can be used to 
distinguish different types (including capsular polysaccharide gene serotypes 
and serosub types; protein antigen gene subtypes; and mobile genetic element 
subtypes). We have shown that molecular methods that detect these sequence 
10 heterogeneities can be used to accurately distinguish and type group B 
streptococci. 

Accordingly in a first aspect the present invention provides a method of 
typing a group B streptococcal bacterium which method comprises analysing 
the nucleotide sequence of one or more regions within the cpsD, cpsE, cpsF, 
cpsG, cpsI/M genes of said bacterium, said region(s) comprising one or more 
nucleotides whose sequence varies between types. 

In particular, the nucleotide sequence may be analysed for one or more 
positions corresponding to positions 62, 78-86, 138, 139, 144, 198, 204, 211, 
281, 240, 249, 300, 321, 419, 429, 437, 457, 466, 486, 602, 606, 627, 636, 645, 
803, 971, 1026, 1044, 1173, 1194, 1251, 1278, 1413, 1495, 1500, 1501, 1512, 
1518, 1527, 1595, 1611, 1620, 1627, 1629, 1655, 1832, 1856, 1866, 1871, 1892, 
1971, 2026, 2088, 2134, 2187 and 2196 as shown in Figure 1. 

Preferably at least one region is within a sequence delineated by the 3' 
136 bases of the cpsE gene and the 5' 218 bases of the cpsG gene of the cpsE- 
cpsF-cspG gene cluster of said group B streptococcal bacterium. In particular, 
15 the nucleotide sequence may be analysed for one or more positions 

corresponding to positions 1413, 1495, 1500, 1501, 1512, 1518, 1527, 1595, 
1611, 1620, 1627, 1629, 1655, 1832, 1856, 1866, 1871, 1892, 1971, 2026, 2088, 
2134, 2187 and 2196 as shown in Figure 1. 

In one embodiment, at least one region is within the cpsI/M genes of said 
20 group B streptococcal bacterium. 

We have also shown that a number of surface protein antigen genes, 
including rib, alp2 or alp3 genes, and five mobile genetic elements may be used 
to molecular subtype GBS. Accordingly, the present invention also provides a 
method of typing a group B streptococcal bacterium which method comprises 
25 determining the presence or absence in the genome of said bacterium of one or 
more surface protein antigen genes selected from a rib, alp2 or alp3 gene, and/or 
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one or more mobile genetic elements selected from IS861, IS1548, IS1381, 
ISSa4 and GBSil. Preferably, such as method is combined with the above 
methods of the invention. 

The nucleotide sequence analysis step may comprise sequencing said 
5 one or more regions. Alternatively, or in addition, the nucleotide sequence 
analysis step may comprises determining whether a polynucleotide obtained 
from said bacterium selectively hybridises to a polynucleotide probe 
comprising one or more of the said regions, preferably to one or more of a 
plurality of polynucleotide probes corresponding to one or more of the said 
10 regions. 

In a preferred embodiment, where hybridisation to a plurality of probes 
is used as a means of analysis, the plurality of polynucleotide probes are 
present as a microarray. 

In another embodiment, the nucleotide sequence analysis step comprises 

15 an amplification step using one or more primers, at least one of which hybridise 
specifically to a sequence which differs between types. Typically, primer pairs 
are used, at least one of which hybridise specifically to a sequence which 
differs between types. Preferably, said primers are selected from the primers 
shown in Table 2 and/or Table 6 and/or Table 10. 

20 In a second aspect, the present invention provides a polynucleotide 

consisting essentially of at least 10 contiguous nucleotides corresponding to a 
region within a cpsD-cpsE-cpsF-cpsG gene of a group B streptococcal bacterium, 
said polynucleotide comprising one or more nucleotides which differ between 
GBS types. 

Preferably the nucleotides which differ between GBS types correspond to 
one or more of positions 62, 78-86, 138, 139, 144, 198, 204, 211, 281, 240, 249, 
300, 321, 419, 429, 437, 457, 466, 486, 602, 606, 627, 636, 645, 803, 971, 1026, 
1044, 1173, 1194, 1251, 1278, 1413, 1495, 1500, 1501, 1512, 1518, 1527, 1595, 
1611, 1620, 1627, 1629, 1655, 1832, 1856, 1866, 1871, 1892, 1971, 2026, 2088, 
2134, 2187 and 2196 as shown in Figure 1. 

25 The present invention also provides a polynucleotide consisting 

essentially of at least 10 contiguous nucleotides corresponding to a region 
within a sequence delineated by the 3 f 136 base pairs of cpsE and the 5' 218 
base pairs of cpsG of the cpsE-cpsF-cspG gene cluster of a group B streptococcal 
bacterium, said polynucleotide comprising one or more nucleotides which 

30 differ between GBS types. 

Preferably the nucleotides which differ between group B streptococcal 
types correspond to one or more of positions 1413, 1495, 1500, 1501, 1512, 



1518, 1527, 1595, 1611, 1620, 1627, 1629, 1655, 1832, 1856, 1866, 1871, 1892, 
1971, 2026, 2088, 2134, 2187 and 2196 as shown in Figure 1. 

The present invention also provides a polynucleotide consisting 
essentially of at least 10 contiguous nucleotides corresponding to a region 
5 within a cpsI/M gene of a group B streptococcal bacterium, said polynucleotide 
comprising one or more nucleotides which differ between group B streptococcal 
types. 

Preferably the polynucleotide is selected from the nucleotide sequences 
shown in Table 2. 

10 The present invention further provides a polynucleotide consisting 

essentially of at least 10 contiguous nucleotides corresponding to a region 
within a rib, alp2 or alp3 gene of a group B streptococcal bacterium, said 
polynucleotide comprising one or more nucleotides which differ between GBS 
protein antigen gene subtypes. 

15 Preferably the polynucleotide is selected from the nucleotide sequences 

shown in Table 6. 

The present invention further provides a polynucleotide consisting 
essentially of at least 10 contiguous nucleotides corresponding to a region 
within IS861, ISJ548, IS/357, ISSa¥ and/or GBSil of a group B streptococcal 

20 bacterium, said polynucleotide comprising one or more nucleotides which 
differ between GBS mobile genetic element subtypes. 

Preferably the polynucleotide is selected from the nucleotide sequences 
shown in Table 10. 

The polynucleotides of the invention may be used in a method of typing, 

25 such as serotyping and/or subtyping, a group B streptococcal bacterium. 

In a third aspect the present invention provides a composition 
comprising a plurality of polynucleotides of the second aspect of the invention. 
The composition may be used in a method of typing, such as serotyping and/or 
subtyping, a group B streptococcal bacterium. 

30 In a fourth aspect the present invention provides a microarray 

comprising a plurality of polynucleotides according to the second aspect of the 
invention. The microarray may be used in a method of typing, such as 
serotyping and/or subtyping, a group B streptococcal bacterium. 
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Detailed description of the invention 

Unless defined otherwise, all technical and scientific terms used herein 
have the same meaning as commonly understood by one of ordinary skill in the 
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art (e.g., in cell culture, molecular genetics, nucleic acid chemistry, 
hybridization techniques and biochemistry). Standard techniques are used for 
molecular, genetic and biochemical methods (see generally, Sambrook et a/., 
Molecular Cloning: A Laboratory Manual, 2 nd ed. (1989) Cold Spring Harbor 
5 Laboratory Press, Cold Spring Harbor, N.Y. and Ausubel et al., Short Protocols 
in Molecular Biology (1999) 4 th Ed, John Wiley & Sons, Inc. - and the full 
version entitled Current Protocols in Molecular Biology, which are incorporated 
herein by reference) and chemical methods. 

The molecular typing methods of the present invention rely on detecting 

10 the presence in sample of specific polynucleotide sequences in regions of the 
genome of group B streptococci (GBS) that we have identified as varying 
between different types. 

More specifically, the specific polynucleotide sequences that are to be 
detected lie within cpsD, cpsE, cpsF, cpsG, cpsl, cpsM, rib, alp2 and/or alp3 

15 genes of GBS as well as mobile genetic elements IS861, IS1548 and 1S1381 , 
ISSa4 and GBSil, preferably the cpsD, cpsE, cpsF, cpsG and/or cpsI/M genes. 

Regions of interest within those genes mentioned are regions whose 
sequence varies between two or more types, i.e. are heterogenous. 
Heterogeneity may be due to insertions, deletions and/or substitutions between 

20 corresponding regions in different types. In the case of rib, alp2 and alp3, 

heterogeneity typically takes the form of the presence or absence of the entire 
gene. Similarly for elements IS861 , IS1548, IS1381, ISSa4 and GBSil 
heterogeneity typically takes the form of the presence or absence of the entire 
sequence. 

25 Specific regions of heterogeneity include the following positions within 

cpsD gene- 62 and 78-86; cpsD-cpsE gene spacer - 138, 139 and 144; cpsE gene - 
198, 204, 211, 281, 240, 249, 300, 321, 419, 429, 437, 457, 466, 486, 602, 606, 
627, 636, 645, 803, 971, 1026, 1044, 1173, 1194, 1251, 1278, 1413, 1495, 1500, 
1501, 1512, 1518 and 1527; cpsF gene - 1595, 1611, 1620, 1627, 1629, 1655, 

30 1832, 1856, 1866, 1871, 1892 and 1971; and cpsG gene - 2026, 2088, 2134, 2187 
and 2196 (numbering corresponds to numbering shown in Figure 1). 

Particularly preferred positions of interest are those that lie within a 790 
bp fragment of cpsE-cps-F-cpsG (which consists of approximately the 3' 136 
bases of cpsE to the 5' 218 bases of cpsG), namely positions 1413, 1495, 1500, 

35 1501, 1512, 1518, 1527, 1595, 1611, 1620, 1627, 1629, 1655, 1832, 1856, 1866, 
1871, 1892, 1971, 2026, 2088, 2134, 2187 and 2196 as shown in Figure 1. 
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Another region of heterogeneity is position 62 of cpsD and a repetitive 
sequence (TTACGGCGA) found at positions 78 to 86 of cpsD in some but not all 
GBS serotypes. 

Specific regions of heterogeneity also include a number of positions 
5 within the cpsI/M gene as shown in the sequence alignment depicted in 
Figure 3. 

These regions of heterogeneity may be analysed using a variety of means 
including sequencing, PCR and binding of labelled probes. 

In the case of sequencing to identify serotype, the sequencing primers are 

10 selected such that they hybridise specifically to a region within or near to a 

region within which a region of heterogeneity is present. The primers need not 
be specific to particular serotypes since the actual sequence information 
obtained during the sequencing process which is used to assign molecular 
serotype. Thus the primers may hybridise specifically to all GBS serotypes (at 

15 least serotypes la to VII), or to specific serotypes. 

Preferred primers anneal within 100, 50 or 20 contigous nucleotides of a 
heterogeneous position within the 790 bp region of cpsE-cpsF-cpsG shown in 
Figure 1. Examples of suitable sequencing primers are shown in Table 2 
(cpsES3, cpsFA, cpsFS, cpsGA and cpsGAl). 

20 PCR and other specific hybridisation- based serotyping methods will 

typically involve the use of nucleotide primers/probes which bind specifically 
to the a region of the genome of a GBS serotype which includes a nucleotide 
which varies between two or more serotypes. Thus the primers/probes may 
comprise a sequence which is complementary to one of such regions. Where 

25 positions of heterogeneity are close together (e.g. positions 198, 204, 211 and 
218 of cpsE), it may be desirable to use a primer/probe which hybridises 
specifically to a region of the GBS genome that comprises two or more positions 
of heterogeneity. Thus for example, a primer/probe may be designed that is 
complementary to nucleotides 195 to 220 of cpsE. Such primers/probes are 

30 likely to have improved specificity and reduce the likelihood of false positives. 

PCR-based methods of detection may rely upon the use of primer pairs, 
at least one of which binds specifically to a region of interest in one or more, 
but not all, serotypes. Unless both primers bind, no PCR product will be 
obtained. Consequently, the presence or absence of a specific PCR product may 

35 be used to determine the presence of a sequence indicative of specific GBS 
serotypes. However, as mentioned, only one primer need correspond to a 
region of heterogeneity in the genes of interest (such as the cpsD, cpsE, cpsF, 
cpsG, cpsl and/or cpsM genes). The other primer may bind to a conserved or 
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heterogenous region within said gene or even a region within another part of 
the GBS genome, such as the cpsH gene, whether said region is conserved or 
heterogeneous between serotypes. Thus, for example, a combination of a 
primer (cpsGS) which binds to a region of the cpsG gene including positions 
5 2172 to 2210, and a primer which binds to a region of cpsH gene which is 

heterogeneous (IacpsHAl, IQcpsHA), maybe used as the basis of distinguishing 
serotypes (la and HI). 

Further, a primer which binds to a region of cpsl which is heterogeneous 
may be combined with a primer which binds to a region of cpsG which is 
10 constant. An example of such as primer pair is primer pair VIcpsIA, and 
cpsGSl, which give rise to a PCR product of 1517 bp and GBS serotype VI 
specific. 

Alternatively, primers that bind to conserved regions of the GBS genome 
but which flank a region whose length varies between serotypes may be used. 
15 In this case, a PCR product will always be obtained when GBS bacteria are 
present but the size of the PCR product varies between serotypes. 

Furthermore, a combination of specific binding of one or both primers 
and variations in the length of PCR primer may be used as a means of 
identifying particular molecular serotypes. 
20 Examples of specific primers/probes which target the cpsD, cpsE, cpsF, 

cpsG, cpsl or cpsM genes include the following: 

cpsDS GCA AAA GAA CAG ATG GAA CAA AGT GG 

cpsES CTT TTG GAG TCG TGG CTA TCT TG 

25 cpsEAl GA/T/GA AAA AAG GAA AGT CGT GTC G/ATT G 

cpsESl CTT GGA C/TTC CTC TGA AAA GGA TTG 

cpsEA2 AAA A/CGC TTG ATC AAC AGT TAA GCA GG 

cpsES2 GAT GGT/C GGA CCG GCT ATC TTT TCT C 

cpsEA3 CTT AAT TTG TTC TGC ATC TAC TCG C 

30 cpsES3 GTT AGA TGT TCA ATA TAT CAA TGA ATG GTC TAT TTG GTC AG 

cpsEFA CCT TTC AAA CCT TAC CTT TAC TTA GC 

cpsFS CAT CTG GTG CCG CTG TAG CAG TAC CAT T 

cpsFA GTC GAA AAC CTC TAT A/GT A AAC/T GGT CTT ACA A/GCC 
AAA TAA CTT ACC 

35 cpsGA AAG/C AGT TCA TAT CAT CAT ATG AGA G 

cpsGAl CCG CCA/G TGT GTG ATA ACA ATC TCA GCT TC 

cpsGS ATG ATG ATA TGA ACT CTT ACA TGA AAG AAG CTG AGA TTG 

cpsGSl GAA CTC TTA CAT GAA AGA AGC TGA GAT TGT TAT CAC AC 



IbcpsIA CTA TCA ATG AAT GAG TCT GTT GTA GGA CGG ATT GCA CG 
IbcpsIS GAT AAT AGT GGA GAA ATT TGT GAT AAT TTA TCT CAA AAA 
GAC G 

IbcpsIAl CCT GAT TCA TTG CAG AAG TCT TTA CGA TGC GAT AGG TG 
5 IVcpsMA GGG TCA ATT GTA TCG TCG CTG TCA ACA AAA CCA ATC AAA TC 
VcpsMA CCC CCC ATA AGT ATA AAT AAT ATC CAA TCT TGC ATA GTC AG 
VIcpsIA GAA GCA AAG ATT CTA CAC AGT TCT CAA TCA CTA ACT CCG 
cpsIA GTA TAA CTT CTA TCA ATG GAT GAG TCT GTT GTA GTA CGG 

10 The primer designations correspond to those given in Table 2. 

In relation to the alp2, alp3 and rib surface protein antigen genes, 
heterogeneity and protein antigen gene subtype is assessed more at the level of 
whether a group B streptococcal bacterium contains the gene or not. Our 
results show that the specific combination of surface proteins genes present in a 

15 GBS genome is indicative of serotype/serosubtypes (see Table 9). Consequently, 
primers/probes suitable for use in the methods of the present invention are 
those that are specific for the particular genes. Thus probes/primers that are 
specific for alp2 or alp3 or rib are preferred. Figure 4 shows an alignment of 
alp2 and alp3 that was used to design primers specific for alp2 or specific for 

20 alp3. 

Examples of specific primers/probes which target the alp2, alp3 and rib 
genes include the following: 

bcaSl GGT AAT CTT AAT ATT TTT GAA GAG TCA ATA GTT GCT GCA 

25 TCT AC 

bcaS2 CCAGGGA GTG CAG CGA CCT TAA ATA CAA GCA TC 

balS GAT CCT CAA AAC CTC ATT GTA TTA AAT CCA TCA AGC TAT TC 

balA CCA GTT AAG ACT TCA TCA CGA CTC CCA TCA C 

bal23Sl CAG ACT GTT AAA GTG GAT GAA GAT ATT ACC TTT ACG G 

30 bal23S2 CTT AAA GCT AAG TAT GAA AAT GAT ATC ATT GGA GCT CGT G 
bal2S CTT CCG CCA GAT AAA ATT AAG 
bal2A CTG TTG ACT TAT CTG GAT AGG TC 

bal2Al CGT GTT GTT CAA CAG TCC TAT GCT TAG CCT CTG GTG 
bal2A2 GGT ATC TGG TTT ATG ACC ATT TTT CCA GTT ATA CG 
35 bal3S GTT CTT CCG CTT AAG GAT AG 

bal3A GAC CGT TTG GTC CTT ACC TTT TGG TTC GTT GCT ATC C 

ribS2 GAAGTAATTTCAG GAA GTG CTG TTA CGT TAA ACA CAA ATA TG 

ribAl GAA GGT TGT GTG AAA TAA TTG CCG CCT TGC CTA ATG 
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ribA2 AAT ACT AGC TGC ACC AAC AGT AGT CAA TTC AGA AGG 
The primer designations correspond to those given in Table 6. 

In relation to the IS861, IS1548, IS1381, ISSa4 and GBSil, heterogeneity 
and subtype is assessed more at the level of whether a group B streptococcal 
bacterium contains the element or not. The number of elements may also be 
assessed. Our results show that the specific combination of mobile elements 
present in a GBS genome is indicative of serotype/serosubtype (see Table 12). 
Consequently, primers/probes suitable for use in the methods of the present 
invention are those that are specific for the particular mobile genetic elements. 
Thus probes/primers that are specific for IS861, IS1548, IS1381, ISSa4 and 
GBSil are preferred. 

Examples of specific primers/probes which target 1S861, IS1548, IS1381, 
ISSa4 and GBSil include the following: 



15 IS861S GAG AAA ACA AGA GGG AGA CCG AGT AAA ATG GGA CG 

IS861A1 CAC GAT TTC GCA GTT CTA AAT AAA TCC GAC GAT AGC C 

IS861A2 CAA ACT CCG TCA CAT CGG TAT AGC ACT TCT CAT AGG 

IS1548S CTA TTG ATG ATT GCG CAG TTG AAT TGG ATA GTC GTC 

IS1548S1 GTT TGG GAC AGG TAG CGG TTG AGG AGA AAA GTA ATG 

20 IS 1548A1 CAT TAC TTT TCT CCT CAA CCG CTA CCT GTC CCA AAC 

IS 1548A2 CCC AAT ACC ACG TAA CTT ATG CCA TTT G 

IS1548A3 CGT GTT ACG AGT CAT CCC AAT ACC ACG TAA CTT ATG CC 

15138151 CTT ATG AAC AAA TTG CGG CTG ATT TTG GCA TTC ACG 

15138152 GGC TCA GGC GAT TGT CAC AAG CCA AGG GAG 

25 IS1381A CTA AAA TCC TAG TTC ACG GTT GAT CAT TCC AGC 

ISSa4S CGT ATC TGT CAC TTA TTT CCC TGC GGG TGT CTC C 

ISSa4Al GCC GAT GTC ACA ACA TAG TTC AGG ATA TAG CCA G 

ISSa4A2 CGT AAA GGA GTC CAA AGA TGA TAG CCT TTT TGA ACC 

GBSilSl CAT CTC GGA ACA ATA TGC TCG AAG CTT ACA AGC AAG TG 

30 GBSilS2 GGG GTC ACT ATC GAG CAG ATG GAT GAC TAT CTT CAC 

GBSilAl AAT GGC TGT TTC GCA GGA GCG ATT GGG TCT GAA CC 

GBSilA2 CCA GGG ACA TCA ATC TGT CTT GCG GAA CAG TAT CG 



Preferably, the primers/probes comprise at least 10 or 15 nucleotides. 
35 Typically, primers/probes consist of fewer than 100, 50 or 30 nucleotides. 
Primers/probes are generally polynucleotide comprising deoxynucleotides. 
They may also be polynucleotides which include within them synthetic or 
modified nucleotides. A number of different types of modification to 
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oligonucleotides are known in the art. These include methylphosphonate and 
phosphorothioate backbones, addition of acridine or polylysine chains at the 3' 
and/or 5' ends of the molecule. For the purposes of the present invention, it is to 
be understood that the polynucleotides described herein may be modified by 
5 any method available in the art. Primers/probes may be labelled with any 
suitable detectable label such as radioactive atoms, fluorescent molecules or 
biotin. 

In one embodiment, primers/probes have a high melting temperature of 
> 70°C so that they may be used in rapid cycle PCR. 

10 Compositions comprising a plurality of nucleotides that are used to 

analyse one or more regions within the cpsD, cpsE, cpsF, cpsG, or cpsI/M genes 
may also further comprise nucleotides that may be used to analyse one or more 
regions within the cpsH gene. Suitable nucleotides are described in the 
Examples, although a person skilled in the art could design other suitable 

15 sequences based on the sequence alignment shown in Figure 3. 

Further, compositions comprising a plurality of nucleotides that are used 
to analyse one or more regions within alp2, alp3 or lib genes may also further 
comprise nucleotides that may be used to analyse one or more regions within 
the C alpha [bed] and C beta [bag] genes. 

20 A variety of techniques may be used to analyse one or more regions 

within the genome of a bacterium of interest. Typically, a sample of interest, 
which may be from a bacterial culture or a clinical sample from a patient, 
which is suspected of containing GBS bacteria is treated using standard 
techniques to obtain genomic DNA from any microorganisms present in the 

25 sample. It may be desirable for a number of subsequent detection steps to use 
nucleic acid preparation techniques that result in substantial fragmentation of 
the genomic DNA. 

The genomic DNA is then subjected to one or more analysis steps which 
may include sequencing, enzymatic amplification and/or hybridisation. These 

30 general techniques of DNA analysis are known in the art and are discussed in 
detail in, for example, Sambrook et al. 1989 and Ausubel et al. 1999 supra. 

Serotyping may involve a one or more steps. For example, it may be 
desirable to carry out an initial step of determining whether there are 
nucleotide sequences present in the sample which are conserved between GBS 

35 seroptypes but not found in any other organism. This may be achieved by 
using PCR primers that detect any (but only) GBS bacteria (e.g. using primer 
pairs Sag59/Sagl90 and/or DSF2/DSR1 - see Tables 2 and 3). 
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Molecular serotyping for specific GBS serotypes can then be performed 
by detecting the presence of one or more regions of heterogeneity in the regions 
of interest using any suitable technique such as sequencing, enzymatic 
amplification and/or hybridisation based on the probes/primers discussed 
5 above. 

A particularly preferred detection technique is PCR, such as rapid cycle 
PCR (Kong et al, 2000). 

An example of a multi-step serotyping strategy (algorithm) is shown in 
Figure 2. However, a variety of other strategies are envisaged and can be 

10 designed by the skilled person using the sequence heterogeneity information 
presented herein. In particular, it is preferred that the serotyping procedure 
comprise at least one analysis step based on analysing one or regions of the 
cpsD, cpsE, cpsF, cpsG and/or cpsI/M genes. This analysis may optionally be 
combined with an analysis of one or more regions within the cpsH gene. Similar 

15 techniques may be used to analyse the cpsH gene regions and suitable primer 
sequences and methods are also described in the Examples. 

Analysis of the presence of absence of the alp2 t alp3 and/or rib genes 
may optionally be combined with an analysis of the presence or absence of 
C alpha (bca gene), C beta {bag gene) gene sequences as is described in the 

20 Examples. Similar techniques may be used to analyse these regions and 
suitable primer sequences and PCR methods are also described in the 
Examples. 

Furthermore, analysis of the presence of absence of the alp2, alp3 and/or 
rib genes (and optionally the bca and bag genes) may be combined with an 
25 analysis of the presence or absence of mobile genetic elements. 

Thus a typing strategy may involve an analysis of cps genes, surface 
protein genes and/or mobile genetic elements in various combinations to 
provide more serosubtyping and subtyping information. 

Analysis of GBS genomic sequences using the above techniques may take 
30 place in solution followed by standard resolution using methods such as gel 
electrophoresis. However in a preferred aspect of the invention, the 
primers/probes are immobilised onto a solid substrate to form arrays. 

The polynucleotide probes are typically immobilised onto or in discrete 
regions of a solid substrate. The substrate may be porous to allow 
35 immobilisation within the substrate or substantially non-porous, in which case 
the probes are typically immobilised on the surface of the substrate. Examples 
of suitable solid substrates include flat glass (such as borosilicate glass), silicon 
wafers, mica, ceramics and organic polymers such as plastics, including 
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polystyrene and polymethacrylate. It may also be possible to use semi- 
permeable membranes such as nitrocellulose or nylon membranes, which are 
widely available. The semi-permeable membranes maybe mounted on a more 
robust solid surface such as glass. The surfaces may optionally be coated with a 
5 layer of metal, such as gold, platinum or other transition metal. 

Preferably, the solid substrate is generally a material having a rigid or 
semi-rigid surface. In preferred embodiments, at least one surface of the 
substrate will be substantially flat, although in some embodiments it may be 
desirable to physically separate synthesis regions for different polymers with, 
10 for example, raised regions or etched trenches. It is also preferred that the solid 
substrate is suitable for the high density application of DNA sequences in 
discrete areas of typically from 50 to 100 |um, giving a density of 10000 to 
40000 cm' 2 . 

The solid substrate is conveniently divided up into sections. This may 

15 be achieved by techniques such as photoetching, or by the application of 
hydrophobic inks, for example teflon-based inks (Cel-line, USA). Discrete 
positions, in which each different probes are located may have any convenient 
shape, e.g., circular, rectangular, elliptical, wedge-shaped, etc. 

Attachment of the library sequences to the substrate may be by covalent 

20 or non-covalent means. The library sequences may be attached to the substrate 
via a layer of molecules to which the library sequences bind. For example, the 
probes may be labelled with biotin and the substrate coated with avidin and/or 
streptavidin. A convenient feature of using biotinylated probes is that the 
efficiency of coupling to the solid substrate can be determined easily. Since the 

25 polynucleotide probes may bind only poorly to some solid substrates, it is often 
necessary to provide a chemical interface between the solid substrate (such as 
in the case of glass) and the probes. Thus, the surface of the substrate may be 
prepared by, for example, coating with a chemical that increases or decreases 
the hydrophobicity or coating with a chemical that allows covalent linkage of 

30 the polynucleotide probes. Some chemical coatings may both alter the 

hydrophobicity and allow covalent linkage. Hydrophobicity on a solid substrate 
may readily be increased by silane treatment or other treatments known in the 
art. Examples of suitable chemical coatings include polylysine and 
poly(ethyleneimine). Further details of methods for the attachment of are 

35 provided in US Patent No. 6,248,521. Methods for immobilizing nucleic acids 
by introduction of various functional groups to the molecules are also described 
in Bischoff et a/., 1987 (Anal. Biochem., 164:336-3440 and Kremsky et a7., 1987 
(Nucl. Acids Res. 15:2891-2910). 
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Techniques for producing immobilised arrays of nucleic acid molecules 
have been described in the art. A useful review is provided in Schena et al., 
1998, TibTech 16: 301-306, which also gives references for the techniques 
described therein. 

5 Microarray-manufacturing technologies fall into two main categories — 

synthesis and delivery. In the synthesis approaches, microarrays are prepared 
in a stepwise fashion by the in situ synthesis of nucleic acids from biochemical 
building blocks. With each round of synthesis, nucleotides are added to 
growing chains until the desired length is achieved. A number of prior art 

10 methods describe how to synthesise single-stranded nucleic acid molecule 
libraries in situ, using for example masking techniques (photolithography) to 
build up various permutations of sequences at the various discrete positions on 
the solid substrate. U.S. Patent No. 5,837,832 describes an improved method for 
producing DNA arrays immobilised to silicon substrates based on very large scale 

15 integration technology. In particular, U.S. Patent No. 5,837,832 describes a 

strategy called "tiling" to synthesize specific sets of probes at spatially-defined 
locations on a substrate which may be used to produced the immobilised DNA 
libraries of the present invention. U.S. Patent No. 5,837,832 also provides 
references for earlier techniques that may also be used. 

20 The delivery technologies, by contrast, use the exogenous deposition of 

preprepared biochemical substances for chip fabrication. For example, DNA 
may also be printed directly onto the substrate using for example robotic 
devices equipped with either pins (mechanical microspotting) or piezo electric 
devices (ink jetting). In mechanical microspotting, a biochemical sample is 

25 loaded into a spotting pin by capillary action, and a small volume is transferred 
to a solid surface by physical contact between the pin and the solid substrate. 
After the first spotting cycle, the pin is washed and a second sample is loaded 
and deposited to an adjacent address. Robotic control systems and multiplexed 
printheads allow automated microarray fabrication. Ink jetting involves loading 

30 a biochemical sample, such as a polynucleotide into a miniature nozzle 

equipped with a piezoelectric fitting and an electrical current is used to expel a 
precise amount of liquid from the jet onto the substrate. After the first jetting 
step, the jet is washed and a second sample is loaded and deposited to an 
adjacent address. A repeated series of cycles with multiple jets enables rapid 

35 microarray production. 

In one embodiment, the microarray is a high density array, comprising 
greater than about 50, preferably greater than about 100 or 200 different nucleic 
acid probes. Such high density probes comprise a probe density of greater than 



i 



14 

about 50, preferably greater than about 500, more preferably greater than about 
1,000, most preferably greater than about 2,000 different nucleic acid probes 
per cm 2 . The array may further comprise mismatch control probes and/or 
reference probes (such as positive controls). 
5 Microarrays of the invention will typically comprise a plurality of 

primers/probes as described above. The primers/probes may be grouped on the 
array in any order. However, it may be desirable to group primers/probes 
according to types (capsular polysaccharide gene serotypes, serosubtypes; 
protein antigen gene subtypes; mobile genelic elements subtypes), or groups of 

10 types (capsular polysaccharide gene serotypes, serosubtypes; protein antigen 
gene subtypes; mobile genelic elements subtypes) for which they are specific. 
Such grouping may be arranged such that the resulting patterns are easily 
susceptible to pattern recognition by computer software. 

Elements in an array may contain only one type of probe/primer or a 

15 number of different probes/primers. 

Detection of binding of GBS genomic DNA to immobilised 
probes/primers may be performed using a number of techniques. For example, 
the immobilised probes which are specific to a number of types (capsular 
polysaccharide gene serotypes, serosubtypes; protein antigen gene subtypes; 

20 mobile genelic elements subtypes), may function as capture probes. Following 
binding of the genomic DNA to the array, the array is washed and incubated 
with one or more labelled detection probes which hybridise specifically to 
regions of the GBS genome which are conserved. The binding of these 
detection probes may then be determined by detecting the presence of the label. 

25 For example, the label may be a fluorescent label and the array may be placed 
in an X-Y reader under a charge-coupled device (CCD) camera. 

Other techniques include labelling the genomic DNA prior to contact 
with the array (using nick- translation and labelled dNTPs for example). 
Binding of the genomic DNA can then be detected directly. 

30 It is also possible to employ a single PCR amplification step using 

labelled dNTPs. In this embodiment, the genomic DNA fragment binds to a 
first primer present in the array. The addition of polymerase, dNTPs, including 
some labelled dNTPs and a second primer results in synthesis of a PCR product 
incorporating labelled nucleotides. The labelled PCR fragment captured on the 

35 plate may then be detected. 

A number of available detection techniques do not require labels but 
instead rely on changes in mass upon ligand binding (e.g. surface plasmon 
resonance- SPR). The principles of SPR and the types of solid substrates 
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required for use in SPR (e.g. BIACore chips) are described in Ausubel et al., 
1999, supra. 

C. Uses 

5 As discussed above, group B streptococcus (GBS) - Streptococcus 

agalactiae - is the commonest cause of neonatal and obstetric sepsis and an 
increasingly important cause of septicaemia in the elderly and 
immunocompromised patients. Thus, the detection methods, probes/primer 
and microarrays of the invention may be used in the diagnosis of GBS 

10 infections in pregnant women, elderly and/or immunocompromised patients. 
The PCR and microarray techniques described herein may be of particular use 
in diagnosing infections in pregnant women due to increased accuracy and 
sensitivity compared to conventional identification and serotyping. They are 
also likely to give faster results since it will not generally be necessary to 

15 culture clinical samples to obtain enough material. Further, the molecular 
techniques can be used in most laboratories without the need for specialist 
expertise or reagents. 

The molecular typing methods of the invention may also assist in 
comprehensive strain identification that will be useful for epidemiological and 

20 other related studies that will be needed to monitor GBS isolates before and 
after introduction of GBS conjugate vaccines. 

The present invention will now be described in more detail with 
reference to the following examples, which are illustrative only and non- 
limiting. The examples refer to Figures: 

25 

Detailed description of the Fifiures. 

Figure 1. Molecular serotype identification based on the sequence heterogeneity 
of the 3 f -end of cpsD-cpsE-cpsF-and the 5'-end of cpsG (relevant primers are 
30 shown). 

Figure 2. Algorithm for GBS molecular serotype (MS) identification by PCR and 
sequencing. 

35 Figure 3. Multiple sequence alignments of the gene sequences of cpsG-cpsH- 
cpsI/M for serotypes la, lb, n, m, IV, V and VI (start and stop codons are 
highlighted in bold). 
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Figure 4. Two sites (*) of sequence heterogeneity between alp2 (AF208158, 
upper lines) and alp3 (AF291065, lower lines) used to distinguish them 
(relevant primers are shown). 

5 

EXAMPLES 

MATERIALS AND METHODS 

10 GBS reference strains and clinical isolates. 

A panel of nine GBS serotypes (la to Vm) was kindly provided by Dr 
Lawrence Paoletti, Channing Laboratory, Boston USA (reference panel 1). Dr 
Diana Martin, Streptococcus Reference Laboratory, at ESR, Wellington, New 
Zealand, provided another panel of nine international reference GBS type- 

15 strains including serotypes la to VI (reference panel 2) (Table 1). In addition, we 
tested isolates from 205 clinical cases including 146 which had been referred 
from various laboratories in New Zealand for serotyping and 59 isolated from 
normally sterile sites over a period of 10 years in one diagnostic laboratory in 
Sydney. One culture was subsequently shown to be mixed, so 206 different 

20 isolates were examined. Conventional serotyping (CS) was performed at the 

Streptococcus Reference Laboratory, at ESR, Wellington, New Zealand, and MS 
at the Centre for Infectious Diseases and Microbiology Laboratory Services, 
ICPMR, Sydney, Australia. 

The two panels of GBS reference strains and 63 selected clinical isolates 

25 were studied in more detail, by sequencing >2200 base pairs (bp) of each to 

identify appropriate sequences for use in MS. These and the remaining clinical 
isolates were then used to evaluate the MS method and compare results with 
those of CS. Typing by both methods was done initially without knowledge of 
results of the other. 

30 Bacterial isolates were retrieved from storage by subculture on blood agar 

plates (Columbia II agar base supplemented with 5% horse blood) and 
incubated overnight at 37°C. 

Conventional serotyping (CS). 

35 CS was performed using standard methodology (Wilkinson and Moody, 

1969). Briefly, an acid-heated (56°C) extract was prepared for each isolate and 
the serotype determined by immuno-precipitation of type-specific antiserum in 
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agarose. An isolate was considered positive for a particular serotype when the 
precipitation occurring formed a line of identity with that of the control strain. 
Antisera used were prepared at ESR in rabbits against serotypes la, lb, Ic, II, HI, 
IV, V and the R protein antigen. Fourteen selected isolates, including six that 
5 were nontypable using antisera against serotypes I-V, six that initially gave 
discrepant results between CS and MS and two separate isolates from a mixed 
culture, were kindly tested using antisera against all serotypes by Abbie 
Weisner and Dr Androulla Efstratiou at Central Public Health Laboratory, 
Colindale, London, UK. 

10 

Molecular serotype identification (MS); development of method. 

Oligonucleotide primers. 

The oligonucleotide primers used in this study, their target sites and 
melting temperatures are shown in Tables 2, 6 and 10. Their specificities and 

15 expected lengths of amplicons are shown in Tables 3, 7 and 11. The primers 
were synthesised according to our specifications by Sigma- Aldrich (Casde Hill 
NSW, Australia). Four previously published oligonucleotide primers, and a 
series of new primers designed by us were used to sequence the genes of 
interest, namely 16S/23S rRNA intergenic spacer region and partial cps gene 

20 cluster, or to amplify unique sequences of individual GBS serotypes. Six 
previously published oligonucleotide primers and a series of new primers 
designed by us were used to sequence parts of and/or to specifically amplify 
genes encoding GBS surface proteins. We also designed a series of primers to 
sequence parts of and/or to specifically amplify five known GBS mobile genetic 

25 elements. Some were designed with high melting temperatures (>70°C) to be 
used in rapid cycle PCR. 

DNA preparation and polymerase chain reaction (PCR). 

Five individual GBS colonies or a sweep of culture were sampled using a 

30 disposable loop and resuspended in 1 ml of digestion buffer (lOmM Tris-HCl, 
pH 8.0, 0.45% Triton X-100 and 0.45% Tween 20) in 2 ml Eppendorf tubes. The 
tubes containing GBS suspension were heated at 100°C (dry block heater or 
water bath) for 10 minutes then quenched on ice and centrifuged for 2 minutes 
at 14,000 rpm to pellet the cell debris. 5 \xL of each supernatant containing 

35 extracted DNA was used as template for PCR (Mawn et al., 1993). 

PCR systems (25|jL for detection only, 50 [iL for detection and 
sequencing) were used as previously described (Kong et al., 1999). The 
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denaturation, annealing and elongation temperatures and times used were 96°C 
for 1 second, 55-72°C (according to the primer Tm values or as previously 
described) for 1 second and 74°C for 1 to 30 seconds (according to the length of 
amplicons), respectively, for 35 cycles. 
5 10 \XL of PCR products were analysed by electrophoresis on 1.5 % 

agarose gels, which were stained with 0.5 jxg ethidium bromide mL"\ For 
detection and/or serotype identification, the presence of PCR amplicons of 
expected length, shown by ultraviolet transillumination, were accepted as 
positive. For sequencing, 40 jjX. volumes of PCR products were further purified 
10 by polyethylene glycol precipitation method (Ahmet et al., 1999). 

Sequencing. 

The PCR products were sequenced using Applied Biosystems (ABI) Taq 
DyeDexoy terminator cycle-sequencing kits according to standard protocols. 
15 The corresponding amplification primers or inner primers were used as the 
sequencing primers. 

Multiple sequence alignments and sequence comparison. 

Multiple sequence alignments were performed with Pileup and Pretty 
20 programs in Multiple Sequence Analysis program group. Sequences were 

compared using Bestfit program in Comparison program group. All programs 
are provided in WebANGIS, ANGIS (Australian National Genomic Information 
Service), 3 rd version. 

25 Nucleotide sequence accession numbers. 

The new sequence data described have been submitted to the GenBank 
Nucleotide Sequence Databases and allocated the following accession numbers: 
AF291411-AF291419 (16S/23S rRNA intergenic spacer regions for serotypes la 
to Vm reference strains from reference panel 1); AF332893-AF332917, 

30 AF363032-AF363060, AF367973, AF381030 and AF381031 (partial cps gene 

clusters for two panels of reference strains [Table 1] and selected representative 
clinical isolates); AF367974 (partial bag gene sequence, with an insertion 
sequence IS1381 from one isolate), AF362685-AF362704 (partial bag gene 
sequences for all bag-positive isolates) and AF373214 (partial rib-like gene for 

35 reference strain Prague 25/60, an R protein standard strain). 

Previously reported sequence data referred to herein have appeared in 
the GenBank Nucleotide Sequence Databases with the following accession 
numbers: AB023574 (16S rRNA gene); U39765, L31412 (16S/23S rRNA 
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intergenic spacer regions); X68427 (S. oralis 23S rRNA gene); X72754 {cfb gene); 
AB028896 {cps gene cluster for serotype la); AB050723 (partial cps gene cluster 
for serotype lb); AF163833 {cps gene cluster for serotype EQ); AF355776 {cps 
gene cluster for serotype IV); AF349539 {cps gene cluster for serotype V); 
5 AF337958 {cps gene cluster for serotype VI); M97256 {bca gene); X58470, 
X59771 {bag gene); U58333 {rib gene); AF208158 {alp2 gene), AF291065- 
AF291072 {alp3 gene); AF064785 {IS1381); M22449 (IS567); Y14270 (ISJ548), 
AF064785 (IS/557); AF165983 (ISSa¥); and AJ292930 (GBSil). 

10 Example 1 - Study of inter- and intra-serotype/serosubtype sequence 
heterogeneity in specific regions of the GBS genome and assessment of 
suitability for molecular serotyping/serosubtyping. 

Polymerase chain reaction. 

15 With two exceptions, all GBS-specific primer pairs produced amplicons 

of the expected size from all reference strains and clinical isolates tested (Table 
3). The exceptions were Sag59/Sagl90 and CFBS/CFBA. Both target the cfb 
gene, but failed to produce amplicons from one clinical isolate, despite repeated 
attempts. We assumed that this isolate either lacked the cfb gene or that the 

20 gene was present in a mutant form. It has been suggested previously that PGR 
targeting the cfb gene will not identify all GBS isolates (Hassan et al., 2000) and 
that another primer pair based on 16S rRNA gene, DSF2/DSR1 (Ahmet et al., 
1999) was not entirely specific. Therefore, in this study, we used both primer 
- pairs (DSF2/DSR1 and Sag59/Sagl90) to confirm all the isolates were GBS. 

25 

Sequence heterogeneity of 16S/23S rRNA intergenic spacer regions. 

The 16S/23S rRNA intergenic spacer regions were sequenced for the 
serotypes la to VIII from reference panel 1. Multiple sequence alignment 
showed differences between serotypes at only two positions: 207 (serotype V is 
30 T or C [T/C], serotypes VH and Vm are C, others are T) and 272 (serotype IB is 
T, others G). These regions are therefore unsuitable for MS. 

Sequence heterogeneity at the 3'-end of cpsD-cpsE-cpsF-and the 5'-end of 
cpsG. 

35 Using a series of primers targeting the 3'-end of cpsD-cpsE-cpsF-and the 

5'-end of cpsG, we amplified and sequenced 2226 or 2217 bp - depending on the 
presence or absence of a nine-base repetitive sequence - from both panels of 
reference strains (serotypes la to VII) and 63 selected clinical isolates. 
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Representative sequences were deposited into GenBank. See Table 1 for 
GenBank accession numbers of reference panel strains. 

Repetitive sequence, 

5 At the 3'-end region of cpsD, we found a nine-base repetitive sequence (TTA 
CGG CGA) in most isolates of MS la and II, some of MS m, all of MS IV, V, and 
VII, but none of the isolates of MS lb or VI examined. (Table 4). The presence 
or absence of this repetitive sequence can be used to further subtype MS la, II 
and ELI (see below). 

10 

Intra-serotype heterogeneity. 

In general, intra-serotype heterogeneity was low - there were minor random 
variations in a few isolates of all serotypes except MS m, in which the intra- 
serotype heterogeneity was more complex. MS m could be divided into four 

15 sequence subtypes on the basis of heterogeneity at 22 positions - 62, 139, 144, 
204, 300, 321, 429, 437, 457, 486, 602, 636, 971, 1026, 1194, 1413, 1501, 
1512,1518, 1527, 1629, and 2134 - and the presence or absence of the repetitive 
sequence (at 78-86) (Table 4). 

Among 60 MS HI isolates (58 clinical isolates and two reference strains), 

20 serosubtypes IQ-1 (30 isolates) and m-2 (22 isolates) were predominant. The 
repetitive sequence was present in serosubtype m-1 but not EQ-2; there were 
differences at seven other sites (139, 144, 204, 300, 321, 636, and 1629). 

There were five isolates belonging to serosubtype HI-3, which contained 
the repetitive sequence and were identical with serosubtype m-1 at three 

25 variable sites (139, 144, and 300) and with serosubtype m-2 at four (204,321, 
626 and 1629). Seroubtype m-3 differed from both serosubtypes m-1 and m-2 
at seven sites (486, 1026, 1413, 1512, 1518, 1527, and 2134). These seven sites 
in serosubtype m-3 were identical with the corresponding sites of MS la. 

There were three serosubtype m-4 isolates, whose sequences were nearly 

30 identical with the corresponding sequence of MS II. The only exception was at 
position 437, where the nucleotide was T in serosubtype m-4 (as in MS VII), 
and C in MS n. This difference can be used (in addition to PGR, see below) to 
differentiate serosubtype m-4 from MS II. Two serosubtype m-4 isolates 
contained the repetitive sequence, and the other did not. Because of the small 

35 number of serosubtype m-4 isolates, we did not use the repetitive sequence to 
subtype them further. 
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In ter-serotype heterogen eity. 

There were 56 sites of heterogeneity between the eight MS (Table 4). The 
most suitable sites, for use in PCR/sequencing for MS, were a group of 23 sites 
nearest to the 3'-end of the region (Table 4, Figure 1), Firstly, they were 
5 consistent across two panels of reference strains and most clinical isolates (the 
only exceptions were the small number of serosubtypes III-3 and III-4 isolates, 
see below). Secondly, they were relatively concentrated within a 790 bp region, 
which is a convenient length for sequencing in a single reaction. Thirdly, they 
contained enough heterogeneity sites to allow differentiation, with few 

10 exceptions, of MS Ia-VII. Based only on this 790 bp region, serosubtype m-3 
cannot be distinguished from MS la, nor serosubtype EH-4 from MS EL 
However, they can be identified by MS El-specific PCR (see below). 

Serotype VK does not form amplicons with primer pairs targeting the 
790 bp region, but can be identified by exclusion after PCR identification of 

15 GBS. In this study, one MS Vm isolate was identified, for which none of the 
primer pairs that amplify the 2226 bp region (in addition to those that amplify 
the 790 bp region) produced amplicons. This result was confirmed by the use of 
serotype VIE-specific antiserum. 

20 Mixed serotype-specificities in single isolates. 

Eleven isolates were identified as one MS on the basis of the MS-specific 
PCR and overall sequence (within the 2226/2217 bp segment) but their 
sequences differed at some sites from isolates of the same MS and shared site- 
specific characteristics of another. They included five serosubtype m-3 isolates 

25 and three serosubtype HI-4 (see above). One non-serotypable reference strain 

(Prague 25/60), which was identified as MS n, differed from other MS II isolates 
at five sites at the 5 '-end of the region, and was identical with MS HI at three of 
these sites. Prague 25/60 MS Hi-specific PCR was negative. One clinical isolate 
identified as CS n, and MS II on the basis of its overall sequence, had bases at 

30 nine sites at the 5'-end of the region, that were characteristic of serotype lb; MS 
lb-specific PCR was negative. Finally, one CS V reference strain (Prague 10/84) 
had the same sequencing result as the corresponding sequence in GenBank 
(AF349539), but both were different, at three sites at the 5'-end of the region, 
from sequences of the other MS V strains that we studied. 

35 All of these mixed-serotype specificities, except for those associated with 

serosubtypes III-3 and IQ-4, occurred at the 5'-end region of the 2226/2217 
fragment. This supported our selection of the 790 bp 3'-end as the sequencing 
target for MS. Using this target, all MS were correctly identified except for MS 
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m belonging to serosubtypes ni-3 and m-4, which can be identified by MS IE- 
specific PCR (see Example 2). 

Example 2 - Molecular serotype identification (MS) based on MS-specific PCR 
5 targeting the 3 '-end of cpsG-cpsH-cps I/cpsM. 

Our sequence alignment results showed that there was significant 
sequence heterogeneity in the 3'-end of cpsG-cpsH-cps I/cpsM (Figure 3), which 
makes it appropriate for use in the design of specific primer pairs for 
differentiation of serotypes la, lb, m, IV, V, and VI directly by PCR. To fulfil 

10 possible additional future requirements - for example, development of 
multiplex PCR and/or to allow further evaluation of the sequence typing 
method, we designed several primer pairs for each serotype (Tables 2 & 3). 
Using two panels of reference strains and the specified conditions, all primer 
pairs amplified DNA only from the corresponding serotypes. When clinical 

15 isolates were tested, similar results were obtained with two sets of MS-specific 
primer pairs. In general, more stringent conditions (lower primer concentration, 
higher annealing temperatures) could be used with primers generating smaller 
amplicons. Those selected for MS are shown in Table 3 and Figure 2. 

A MS was assigned, by PCR, to 179 of 206 (86.9%) clinical isolates as 

20 follows: MS la 40; MS lb 35; MS m 58 (including those previously identified as 
serosubtypes HI- 3 and 1H-4); MS IV 7; MS V 36; MS VI 3. 

Example 3 - Comparison of serotype identification results between MS 
and CS. 

25 After CS and MS had been completed, the results were compared. Initial 

results were discrepant for 15 isolates, all but five of which (see below) were 
resolved by retesting and/or correction of clerical errors. 

The CS and MS/sequence subtyping results are shown in Table 5. A MS 
was assigned to all isolates by PCR and/or sequencing, compared with 188 of 

30 206 (91.3%) by CS. Specific PCR has not yet been developed for MS II and VIE, 
so all MS II isolates were determined by sequencing only and one presumptive 
MS Vm isolate was decided by exclusion (see Example 1). For all other isolates, 
the results of PCR and sequencing were consistent, except for serosubtypes DI-3 
and m-4 and other minor sequence differences described above (Example 1). CS 

35 results correlated well with PCR results. 

Final CS and MS results were the same for all 188 isolates (100%) for 
which results for both methods were available. Eighteen clinical isolates that 
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were non-serotypable by CS, were assigned MS as follows: la, two; lb, five; n, 
one; serosubtype ni-1, three; serosubtype IH-2, one; V, five; and VI, one. 

Sequences (2217 bp) of three clinical isolates that we identified as MS 
VI, were identical with those for serotype VI reference strains and the 
5 corresponding sequence in GenBank (AF337958). 

Mixed culture. 

Four clinical isolates gave positive results with MS IE-specific PCR, but 
were provisionally identified as MS II by sequencing. Three were CS HI and one 

10 CS n, with a weak cross-reaction with serotype HI antiserum. These isolates 
were studied further by subculturing 12 individual colonies of each. All 
subcultures were tested by MS Hi-specific PCR. All 12 colony subcultures of 
the three CS HI isolates were positive by MS Hi-specific PCR and the isolates 
were therefore classified as serosubtype m-4 (see above). However, 11 of 12 

15 colony subcultures of the fourth isolate were negative by MS m-specific PCR; 
and one was positive by MS m-specific PCR. It was therefore assumed that this 
was a mixed culture, predominantly of MS/CS n. The one MS m-specific PCR 
positive colony was subsequently identified as serosubtype 1H-2 and included 
as an additional clinical isolate (total 206 in all). 

20 

Example 4 - Algorithm for serotype assignment of GBS by PCR and sequencing 

As an example of how the PCR and sequencing methods described above 
may be used clinically to perform GBS serotype identification, we designed an 
algorithm for clinical use. All the primers (except the inner sequencing primers) 
25 used were given high melting temperature (> 70 °C), so rapid cycle PCR could 
be used (Figure 2) (see Table 2 for primer sequences). 

Example 5 - Identification of regions in the alp2, alp3 and rib genes suitable 
for protein antigen gene specific subtyping 

30 Polymerase chain reactions. 

With few exceptions, all primer pairs produced amplicons of predicted 
length from isolates giving positive results (Table 7). The exceptions included 
one isolate that was positive by PCR using primer pairs GBS1360S/GBS1937A 
and GBS1717S/GBS1937A (which both target bag gene) but produced 
35 amplicons significantly longer than those of other bag gene-positive isolates. 

Sequencing showed that the amplicon contained the insertion sequence IS1381 
with minor variations compared with the published sequences (Tamura et al., 
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2000). The amplicons produced using primers IgAagGBS/RIgAagGBS and 
IgASl/IgAAl (also targeting bag gene) varied in length (Berner et aL, 1999) and 
were sequenced for further subtyping (see below and Table 8). 

5 Amplicon sequencing results. 

To confirm the specificity of selected primer pairs that we had designed 
or modified, we sequenced 10 of 23 amplicons produced by bcaSl/bcaA 
(targeting the 5'-end of bca gene) and all of those produced by ribSl/ribA3 
(targeting rib gene) and GBS1360S/GBS1937A (targeting bag gene), from the two 

10 panels of reference strains and 31 randomly selected clinical isolates. . 

All 10 amplicons of primers bcaSl/bcaA and 12 of 13 of primers 
ribSl/ribA3 were identical with the corresponding gene sequences in GenBank 
(M97256, bca gene and U58333, rib gene, respectively). One additional isolate, 
namely Prague 25/60 in reference panel 2 (which is used to raise R antiserum), 

15 produced an amplicon with primer pair ribSl/ribA3 only at a lower annealing 
temperature (55 °C) but not with ribS2/ribAl and ribS2/ribA2. It was therefore 
assumed not to contain rib gene, although the amplicon sequence showed 
considerable homology with rib gene (71.4% or 66.6% according to whether or 
not the primer sequences were included) (Figure 3). This isolate was the only 

20 one, of 224 tested, for which PCRs were negative using ribS2/ribAl and 

ribS2/ribA2 but positive using ribSl/ribA3. The latter primer pair is assumed to 
be not entirely specific for rib gene and was therefore used only for sequencing. 

Four of 10 amplicons of primer pair GBS1360S/GBS1937A (targeting bag 
gene) were identical with the corresponding sequence in GenBank (X58470, 

25 X59771). A single point mutation (A to G, 1441 of X59771) was found in the 
remaining six bag gene amplicons, including the one which contained the 
insertion sequence IS1381 (see above and AF367974). 

Amplicons from all of the 224 isolates that gave positive PCR results 
using primer pairs bcaSl/balA (targeting alp2/alp3 genes), bal23Sl/bal2A2 

30 (targeting alp2 gene) and IgAagGBS/RIgAagGBS (targeting bag gene) were 
sequenced. 

Fifty isolates produced amplicons using primer pair bcaSl/balA. The 
sequences of nine were identical with the corresponding portions of the 
published sequence of alp2 gene (AF208158) and 41 with that of alp3 gene 
35 (AF291065). There are two consistent heterogeneity sites between alp2 and alp3 
genes in the sequences of bcaSl/balA amplicons (Figure 4), which can be used 
to distinguish them, in addition to alp2 and alp3 gene -specific PCR. All nine 
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amplicons of primer pair bal23Sl/bal2A2 were identical with the corresponding 
portion of the alp2 gene sequence in GenBank (AF208158). 

The primer pair IgAag GB S/RIg Aag GB S identified bag gene in 52 isolates. 
There was considerable sequence variation, which allowed separation of bag 
5 gene -positive isolates into 11 groups and 20 subgroups based on amplicon 
length and sequence heterogeneity, respectively (Table 8). The groups 
contained small numbers (one to five) of isolates except for Bl (20 isolates, 2 
subgroups) and B4 (11 isolates, 3 subgroups). The differences in amplicon 
length was generally caused by the presence or absence of short repetitive 
10 sequences. 

Further confirmation of specificity of surface protein gene-specific primer 
pairs. 

To confirm primer specificity, we compared the results of PCR using the 

15 primer sequences we had designed or modified for bag gene PCR, with those of 
PCR using previously published primers and found 100% correlation. 

The previously reported non-specificity of the published primer pair 
bcaRUS/bcaRUA (targeting the bca gene repetitive unit) was confirmed. Using 
these primers, all nine alp2 gene positive (bcaSl/bcaA negative) isolates and 53 

20 which were PCR negative using the primers bcaSl/bcaA, bcaS2/bcaA (targeting 
the 5'-end of bca gene), bal23Sl/bal2A2 and bal23S2/bal2Al (targeting the 5'- 
end of alp2 gene) produced amplicons. Our sequencing showed that bca gene 
and alp2 gene have significant homology in the regions targeted by bcaRUS/ 
bcaRUA allowing amplicon formation from alp2 gene -positive strains. These 

25 false positive results could be due to the presence of other C alpha-like proteins, 
containing regions homologous with the bca gene repetitive unit [bca gene 
repetitive unit-like sequence). 

We also showed that the results of PCR using two or more primer pairs that 
we had designed for individual genes (rib, alp2, and alp3 genes) correlated well, 

30 supporting the specificity of each set. The only exception, as mentioned above, 
was ribSl/ribA3, which produced a non-specific amplicon from one of 224 
isolates tested. 

Example 6 - The relationship between surface protein antigen gene profiles 
35 and cps serotypes/serosubtypes. 
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Surface protein gene profiles. 

For each gene (except bca gene repetitive unit or bca gene repetitive unit- 
like region), we selected two primer pairs to identify and characterise GBS 
surface protein by PCR. Each isolate was given a protein gene profile code 
5 according to PCR results as follows: 

"A": 5'end of bca gene amplified by bcaSl/bcaA and bcaS2/bcaA; 

"a" or "as": bca gene repetitive unit or bca gene repetitive unit-like 

region amplified by bcaRUS/bcaRUA, with multiple or single band 

amplicons, respectively; 
10 "B": bag gene amplified by GBS1360S/GBS1937A and 

IgAagGBS/RIgAagGBS; 

"R": rib gene amplified by ribS2/ribAl and ribS2/ribA2; 
"alp2": alp2 gene amplified by bal23Sl/bal2A2 and bal23S2/bal2Al and 
"alp3": alp3 gene amplified by bal23Sl/bal3A and bal23S2/bal3A (Table 
15 7). 

Four common profiles accounted for 203 of 224 (90.6%) isolates: "R" (62 
isolates), "AaB" (51 isolates), "a" (49 isolates) and "alp3" (41 isolates) (see Table 
4). Only two isolates contained no surface protein gene markers. All but one 
isolate with the bag gene ("B") also had bca gene, with its repetitive unit ("Aa"); 
20 one had rib gene. All "alp2" isolates contained single bca repetitive unit-like 

sequences ("as"). "A", "R", "alp2" and "alp3" were all mutually exclusive. 62 of 
63 isolates with rib gene ("R") and 41of 41 isolates with alp3 gene had no other 
protein antigen markers. 

25 The relationship between surface protein antigen gene profiles and cps 
serotypes/serosubtypes. 

A cps molecular serotype (MS) was assigned to all isolates in accordance 
with the methods described in Examples 1 to 4 and the results correlated with 
conventional serotyping (CS) results except for 19 of 224 isolates that were 

30 nontypable using antisera. The relationship between surface protein gene 
profiles and cps MS are summarised in Table 9. 

The following strong associations were confirmed or demonstrated 
between: MS la and bca gene repetitive unit or bca gene repetitive unit-like 
sequence (most with profile "a"); MS serosubtypes ni-1 and EQ-2 and rib gene; 

35 MS serosubtype IQ-3 and alp2 gene; MS lb and bca/bag genes and MS V and 
alp3 gene. MS II showed the most varied surface protein gene profiles. 
However, the relationships were not absolute and different combinations of cps 
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serotypes and protein gene profiles produced 31 different serovariants or 51 
when bag gene ("B") subgroups were considered. 

Example 7 - The relationship between surface protein antigens and protein 
5 gene profiles. 

Based on conventional serotyping, 33 isolates (belonging to CS Ia/c, Ib/c, He, lib, 
IQc or mb) reacted with the C antiserum. The surface protein gene profiles of 
all these isolates contained bca gene ("A") or bca gene repetitive unit-related 
markers ("a" or "as"): Aa, 3; AaB, 18; a, 11; alp2as,l. Twenty nine isolates 
10 reacted with the R antiserum and, of these, 22 contained rib gene and six, alp3 
gene. The strain used to raise the R protein antiserum (Prague 25/60) contained 
a presumed rib-like gene (see above and Figure 3). 

Example 8 - Identification of mobile genetic elements suitable for molecular 
15 subtyping 

We developed a series of PCR primers to screen for the presence of five 
mobile elements in GBS serotypes. 

Specificity of primers pairs. 

20 All the primer pairs produced amplicons of the expected lengths (Table 

11) from some reference and/or some clinical isolates (Table 12). To evaluate 
the specificity of our primer pairs, we sequenced all amplicons produced by 
primers IS1548S/IS1548A3 and ISSa4S/ISSa4A2, and amplicons, selected from 
both reference and clinical isolates, produced by IS861S/IS861A2 (12 isolates), 

25 IS1381S1/IS1381A (24 isolates) and GBSilSl/GBSilA2 (11 isolates). 

All 41 IS1548 and 15 ISSa4 amplicon sequences were identical with the 
corresponding sequences in GenBank (Y14270 and AF165983, respectively). 
Five of 12 IS861 amplicon sequences were identical with the corresponding 
IS861 sequence in GenBank (M22449). The other seven differed, at position 

30 732, from the published sequence (G to A) and the reference strain Prague 25/60 
had two additional differences - G to A and T to A - at positions 576 and 830 of 
M22449, respectively. 

Previously, we found a full-length insertion sequence IS 138 1 (AF367974) 
within C beta antigen gene of a clinical isolate, with several differences 

35 compared with the original published sequence (AF064785): the terminal 

inverted repeats contained 15, rather than 20 base pairs (bp); there was a three 
bp deletion and four individual bp differences in the putative transposase 
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pseudogene between positions 419 to 429 (of the original GenBank sequence) - 
GGG ATC CGA TT (AF064785) vs CAG A- -GG TA (AF367974; our sequence). 
All amplicons of primer pair IS1381S1/IS1381A from 12 reference and 12 
selected clinical isolates were identical with each other and with that of our 
5 IS1381 sequence in GenBank (AF367974) but different, as above, from the 
original reported IS 1381 sequence (AF064785). 

The amplicons of primer pair GBSilSl/GBSilA2 from all four GBSil- 
positive reference strains and seven selected clinical isolates were sequenced. 
Six (including those of three reference strains) were identical with the 

10 corresponding GBSil sequence in GenBank (AJ292930). Amplicons from four 
clinical isolates showed three site-variations (C to T at position 767, A to C at 
position 846 and T to C at position 923 of AJ292930 sequence). The reference 
strain Prague 25/60 showed only the first two of these site-variations. 

In addition to sequencing, we evaluated the specificity of our primer 

15 pairs by comparing PCR results for two or more primer pairs for each target 
(Table 11). In all cases, the same sets of isolates gave positive results when 
tested with PCR targeting the same mobile genetic elements, thus confirming 
the specificity of the primer pairs. 

20 PCR results using specific primer pairs for all five mobile genetic elements. 

IS861, IS1548, IS1381, ISSa4 and GBSil were identified in 55%, 18%, 85%, 
7% and 19% of isolates, respectively. None of the mobile elements was detected in 
10 (4%) isolates. The distributions of the five mobile elements identified by PCR in 
224 GBS isolates are shown in Table 12. IS1381 was detected alone in 79 isolates 

25 and GBSil alone in one. Forty-six isolates contained two different insertion 

sequences {IS861 and IS 1 381 f 42 isolates ; IS 1 548 and IS J 351, three isolates; ISSa4 
and IS1381, one isolate). Forty-four isolates contained three [IS861, 1S1548 and 
1S1381 34; IS861, ISSa4 and IS1381, 10) and one contained all four insertion 
sequences. Forty-one isolates contained GBSil in combination with one (ISS61, 22; 

30 IS1381, one isolate) two [IS861 and ISI 381 9 11; ISSa4 and IS1381, three isolates) or 
three (ISS63, 1S1548 and IS1381, four isolates) insertion sequences. 

Example 9 - The relationships between cps serotypes, serosubtypes, surface 
protein gene profiles and mobile genetic elements. 

35 The distribution of each of the five mobile genetic elements in different cps 

serotypes, serotype HI subtypes and surface protein gene profiles are shown in 
Table 12. The most consistent findings for each sero/serosubtype were: 



29 



1) Serotype la - most (>80%) expressed proteins that closely related with C alpha 
protein and contained IS 1381 

2) Serotype lb - most (>90%) expressed C alpha and C beta proteins and contained 
IS861 and IS1381 

5 3) Serotype II - exhibited two common patterns: 

a) >50% expressed C alpha protein (and often C beta) and contained IS861, 
IS1381 and sometimes other mobile elements, especially ISSa4 or 

b) >25% expressed Rib protein and contained IS861, IS1381 and GBSil 

4) Serosubtype IH-1 - all expressed Rib protein and contained IS861, IS1548 and 
10 IS1381 but not GBSil. 

5) Serosubtype EQ-2 - all expressed Rib protein and contained IS861 and GBSil 
but neither IS1548 nor IS1381. 

6) Serosubtype 1H-3 - all expressed C alpha-like protein 2 and contained no mobile 
genetic elements. 

15 7) Serosubtype HI-4 - expressed various proteins; all contained GBSil. 

8) Serotype IV - most expressed proteins that closely related with C alpha protein 
and contained IS 1381 

9) Serotype V - most expressed C alpha-like protein 3 contained IS1381 

10) GBSil and IS1548 were mutually exclusive in serotype m (IH-1, IH-2 and m-4) 
20 but not in serotype n. 

11) All isolates that expressed C alpha-like protein 2 contained no insertion 
sequences. 

DISCUSSION 

25 Capsule production in GBS is controlled by capsular polysaccharide 

synthesis (cps) gene cluster, which had been sequenced for serotype la and 
serotype EQ before we began our study. Corresponding sequences for serotype lb 
(Miyake et al., 2001 submitted into GenBank, GenBank accession number: 
AB050723), and for serotypes IV, V, and VI (McKinnon et al., 2001 submitted 

30 into GenBank, GenBank accession numbers: AF355776, AF349539, AF337958, 
respectively) were released recently when the project was nearly finished but 
those for the other three serotypes (II, VH and VHI), the sequences of cps gene 
clusters, have not been published previously. 

The sequences of cps gene clusters for serotypes la, and HI showed 

35 considerable homology at the 3'-end of cpsD-cpsE-cpsF-and the 5 '-end of cpsG. 
We designed a series of primers to amplify a 2226/2217 bp segment in this 
region and found that amplicons were obtained from all serotypes except Vm. 



30 



This confirmed a previous suggestion that serotype Vm is significantly different 
from other serotypes in this region. 

Using eight serotype (la to VII) reference strains, we showed more than 
50 heterogeneity points between serotypes (Figure 1, Table 4). Using 63 selected 
5 clinical isolates that had been sero typed by conventional methods, we found 
that these inter-serotype differences were generally consistent and specific, 
especially the 23 sites clustered at the 3'-end of the regions. We used these 
differences to assign serotypes to the remaining clinical isolates collected in this 
study, without knowledge of the serotype obtained by conventional methods. 

10 Sequence analysis of the 3'-end of cpsG-cpsH-cpsI/cpsM for serotypes la, 

HI, lb, IV, V and VI showed that this region is highly variable (Figure 3), making 
this region a suitable target for direct serotype identification by PCR. We 
designed several pairs of MS-specific primers for MS la, lb, HI, IV, V and VI and 
used them to test two CS reference panels. Selected primer pairs were used for 

15 MS, by PCR alone, of 86.9% of our 206 clinical isolates. Using rapid-cycle MS- 
specific PCR, results are available within one working day. In future, it will be 
possible to extend this method to all MS, when cps gene cluster sequences in 
this region are available for serotypes n, VII and Vm. Meanwhile, MS II and VII 
can be identified by sequencing the 790 bp PCR amplicons of the 3'-end of 

20 cpsE-cpsF-the 5'-end of cpsG (Figure 1, Table 4). A positive GBS-specific PCR 

and negative PCR results with all the primers that amplify the 790 bp, identified 
MS Vin, by exclusion. 

In future, and in some laboratories currently, sequencing of the 790 bp 
PCR amplicons of the 3'-end of cpsE-cpsF-the 5 '-end of cpsG for all isolates may 

25 be more convenient, as only one method and fewer primers are needed. 

However, if sequencing is not available in-house, the turn-around time is longer 
and a small proportion of serotypes would be wrongly assigned (serosubtypes 
IQ-3 and m-4 as MS la and n, respectively). This could be avoided by screening 
with MS m-specific PCR first. Sequencing the 790 bp PCR amplicon, allows MS 

30 HI to be subtyped on the basis of the sequence heterogeneity. 

Previous studies have shown that serotypes la, lb, n, HI, and V are those 
most frequently isolated from normally sterile sites, in the United States and 
several countries. Serotypes VI and VIE are the predominant serotypes isolated 
from patients in Japan, but are uncommon elsewhere. Although our isolates 

35 were selected, they were probably representative of those causing disease in 
Australasia; la, lb, n, HI, and V were the most common serotypes identified, 
although there were small numbers of serotypes IV, VI and, Vm. 
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Up to 13 % of GBS isolates are non-serotypable and in our study the 
proportion was 8.7% (18/206) using the antisera available. This maybe due to 
decreased type-specific-antigen synthesis; non-encapsulated phase variation; or 
insertion or mutation in genes of cps gene clusters. One non-serotypable strain 
5 GBS in our study had a T base deletion in cpsG gene, which caused a change in 
the cpsG gene reading frame. 

We have also developed PCR-based methods to identify GBS surface 
protein genes and further characterise these isolates. Using the published bag 
gene sequence, we modified bag gene-specific primers and designed new 

10 primers, with high melting temperatures (> 70 °C) suitable for rapid cycle PCR 
targeting all major surface protein genes. 

As previously reported, a published PCR primer pair targeting the bca 
gene repetitive unit (at the 3'-end of bca gene), was not entirely specific for bca 
gene. We designed two new primer pairs targeting the 5 '-end of bca gene, to 

15 improve the specificity. However, very few serotype la strains gave positive 
results using these primers whereas all were PCR positive using primers 
targeting the bca gene repetitive unit. These results were consistent with a 
previous report, that a probe targeting the 5'-end of bca gene hybridized with 
only one of nine serotype la strains, but a large bca gene probe, including the 

20 tandem repeat region, hybridized with all nine strains. 

PCR specific for rib, alp2 and alp3 genes has not been described 
previously. The primer pairs we designed mainly targeted the S'-ends of the 
gene and were chosen after comparing the gene heterogeneity with related gene 
sequences. We designed two or more primer pairs for each gene to check primer 

25 specificity by comparison of results of different PCR targeting the same genes. 
Protein gene profiles "alp2" and "alp3"were distinguished on the basis of the 
alp2 and alp3 gene -specific PCR and/or two sequence heterogeneity sites in the 
amplicons of bcaSl/balA, or bcaS2/ balA. 

To confirm the specificity of our primers, we used them to examine two 

30 reference panels and selected GBS isolates. The longest amplicons produced by 
PCR for each gene were sequenced, to provide maximal sequence information 
and ensure that the inner primers were not located at strain heterogeneity sites. 
Our sequencing results confirmed the specificity of the primers. Two pairs of 
primers for each gene were compared, with similar results. Finally, six 

35 gene/region specific primer pairs (including the one targeting the bca gene 
repetitive unit) were used to define protein antigen gene profiles for all 224 
isolates. 
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The study showed that only one member of the surface protein gene 
family containing repetitive sequences - rib, bca, alp2, and alp3 genes-could be 
present in any single isolate. However, all isolates containing bag gene, which is 
not a member of the surface protein gene family containing repetitive 
5 sequences, also contained either bca gene (51/52) or rib gene (1/52). 

Bag gene was present in 23% of isolates, a similar proportion to that (19- 
22%) previously reported. In common with others, we found variations in the 
bag gene due to variable small internal repetitive sequences. These bag gene 
repetitive sequences were irregular (unlike those of the bca-rib gene family). 

10 Their role is not clear, but they are potentially useful molecular markers for 
epidemiological studies. 

Our data show that some serotype HI isolates (our MS serosubtypes m-1 
and m-2) were closely associated with rib gene, and others (our MS serosubtype 
m-3) with alp2 gene. Serotype lb was associated with bca and bag genes and 

15 serotype V with alp3 gene. However, as the relationship was not absolute, 
different combinations of cps serotypes-serosubtypes/protein gene profiles 
identified many serovariants, which will be useful in epidemiological studies 
and in formulation of conjugate vaccines. Based on PCR only, we were able to 
divide our 224 isolates into 31 serovariants based on bag gene (B) groups or 51, 

20 based on subgroups. Theoretically, there are likely to be additional serovariants. 
We found that the antisera to "c" and "R" protein antigens were not 
entirely specific for any particular protein genes. However, reaction with "c" 
antiserum mostly reflected the presence of genes encoding C alpha (bca gene) 
and related protein antigens (at least including alp2 gene) and the antiserum to 

25 "R" with those encoding Rib [rib gene) and related proteins (at least including 
alp3 gene, and the rare presumed rib-like gene). 

We have also investigated the presence of a number of mobile element in 
different serotypes of GBS. Four different insertion sequences have been identified 
previously in GBS. Multiple copies of IS861 in some serotype m isolates were 

30 associated with increased capsule gene expression. We found IS861 in all 

serosubtypes IH-1 and m-2 and most serotype II and lb isolates but few others. All 
1S861 -containing isolates contained at least one additional mobile element. 

Multiple copies of IS 1 381 have been found in a high proportion GBS and 
other Streptococcus species, including S. pneumoniae and used as probes for 

35 restriction fragment length polymorphism (RFLP) analysis of GBS for 

epidemiological studies (Tamura et al., 2000). We found IS1381 in 85% of 
isolates overall. They were present in all isolates of serosubtype m-1 but none of 
serosubtypes m-2 or m-3. Our IS1381 sequences, from 24 isolates, were identical 
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with each other, but differed at several sites, from that previously described 
(AF064785). The significance of these differences is unknown, but it 
emphasizes the importance of confirming sequences from as many different 
strains as possible. 

5 ISSa4 was first identified in a nonhemolytic GBS isolate, in which it caused 

insertional inactivation of the gene cylB, which is part of an ABC transporter . 
involved in production of hemolysin. Only a small proportion of (mainly hemolytic) 
GBS isolates (4%) contained ISSa4, all of which had been isolated since 1996 and it 
was postulated that ISSa4 had been newly acquired by GBS. We also found ISSa4 

10 in only a small proportion of isolates (7%) but it was present in similar proportions 
of clinical isolates obtained before (4 of 44) and during or after (11 of 162) 1996. 

IS1548 was first discovered in some hyaluronidase-negative GBS serotype EQ 
isolates, in which it caused insertional inactivation of the gene hylB (one of a 
cluster responsible for production of hyaluronidase, an important GBS virulence 

15 factor) (Granlund et al., 1998). A copy of IS1548 is also found downstream of the 

C5a peptidase gene (also associated with virulence), in isolates that contain it. Most 
ISl548-containing isolates were from patients with endocarditis and it was 
postulated that inactivation of hyaluronidase production and/or some effect on C5a 
peptidase may allow GBS isolates to adhere to and survive on heart valves. 

20 We found IS 1 548 in all serosubtype EQ-1 isolates, which represented 52% of 

58 serotype m isolates in our collection, from superficial (eight of 12) and normally 
sterile (22 of 46) specimens. The latter were from neonates (seven of 20), adults 
(three of six) and subjects of unspecified age (12 of 20) (data not shown). Although 
specific clinical data were unavailable, GBS endocarditis is uncommon and likely 

25 to have been present in few, if any, of these subjects. Further study is required to 
elucidate the association with this insertion sequence with specific virulence 
factors and clinical syndromes. 

We found GBSil, a group II intron, in 19% of our 224 isolates overall; it was 
commonly associated with \S861 , and the distribution varied with 

30 serotype/serosubtype. It was rarely found in serotypes other than II and III. It was 
present in more than 50% of serotype II isolates, including four, which also 
contained IS 1 548. It was found in all serosubtypes IH-2 and IQ-4 isolates, in which 
1S1548 was not found, but in no serosubtype m-1 isolates which did contain IS1548 
or serosubtype m-3 isolates which did not. 

35 Our subdivision of GBS serotype in into four serosubtypes, based on 

differences within the cps gene cluster was supported by corresponding differences 
in surface protein gene profiles and distribution of the five mobile elements 
described in this study. Although we did not test our isolates for hyaluronidase 
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activity, it is likely that our serosubtype EQ-1, which expresses Rib protein and 
contains IS1548, IS861 and IS1381, corresponds with the hyaluronidase negative 
subtype m-2, described by Bohnsack et al., 2001. Our serosubtype III-2 also 
expresses Rib protein and contains IS861 and GBSil and probably corresponds 
5 with subtype m-3 of Bohnsack et al., 2001. Serosubtypes m-3 and m-4 were 

represented by relatively few isolates. The former (in common with some serotype 
la isolates) expressed the C alpha-like protein 2 and contained no mobile elements 
(an otherwise uncommon finding). The latter is closely related to serotype EL, with 
which it shares sequence homology in a section of the cps gene cluster and various 

10 surface protein profiles and mobile elements. 

Thus, in summary, we have developed an alternative to conventional 
serotyping for GBS, which is accurate and reproducible, can be performed by 
any laboratory with access to PCR/sequencing and, importantly, does not 
require panels of serotype-specific antisera that are increasingly difficult to 

15 maintain. All isolates are serotypable and sequencing of a relatively limited 790 
bp region can provide additional serosubtyping information for MS HI. The 
molecular methods we have described for serotype identification, together with 
the protein profiling (or protein antigen subtyping) and identification of mobile 
genetic elements (or mobile genetic elements subtyping) provide potentially 

20 useful markers for further phylogenetic and epidemiological studies of GBS as 
well as comprehensive strain identification that will be useful for 
epidemiological and other related studies that will be needed to monitor GBS 
isolates before and after introduction of GBS conjugate vaccines. 

The various features and embodiments of the present, referred to in 

25 individual sections above apply, as appropriate, to other sections, mutatis 

mutandis. Consequently features specified in one section may be combined with 
features specified in other sections, as appropriate. 

All publications mentioned in the above specification are herein 
incorporated by reference. Various modifications and variations of the 

30 described methods and system of the invention will be apparent to those skilled 
in the art without departing from the scope and spirit of the invention. 
Although the invention has been described in connection with specific 
preferred embodiments, it should be understood that the invention as claimed 
should not be unduly limited to such specific embodiments. Indeed, various 

35 modifications of the described modes for carrying out the invention which are 
readily apparent to those skilled in molecular biology or related fields are 
intended to be within the scope of the following claims. 
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Table 1. GBS reference panels used in this study. 



Lab strain number 


Source 


Serotype 


MS/serosubtype 


GenBank 
accession 
numbers 


Reference panel l 1 










090 


Channing 


la 


la 


AF332893 


H36B 


Channing 


lb 


lb 


AF332903 


18RS21 


Channing 


n 


n 


AF332905 


M781 


Channing 


m 


m-2 3 


AF332896 


3139 


Channing 


IV 


IV 


AF332908 


CJB 111 


Channing 


V 


V 


AF332910 


SS1214 


Channing 


VI 


VI 


AF332901 


7271 


Channing 


vn 


vn 


AF332913 


JM9 130013 


Channing 


vni 


vm 




Reference panel 2 2 










NZRM 908 


ESR 


la 


la 


AF332894 


(NCDC SS615) 










NZRM 909 


ESR 


lb 


lb 


AF332904 


(NCDC SS618) 










NZRM 910 


ESR 


Ic 


la 


AF332914 


(NCDC SS700) 










NZRM 911 


ESR 


II 


n 


AF332906 


(NCDC SS619) 










NZRM 912 


ESR 


in 


m-3 3 


AF332897 


(NCDC SS620) 










NZRM 2217 


ESR 


Non-typable 


n 


AF332907 


(Prague 25/60) 




(R) 






NZRM 2832 


ESR 


IV 


IV 


AF332909 


(Prague 1/82) 










NZRM 2833 


ESR 


V 


V 


AF332911 


(Prague 10/84) 










NZRM 2834 


ESR 


VI 


VI 


AF332902 


(Prague 118754) 











Notes. 

1. Reference panel 1: supplied by Dr Lawrence Paoletti, Channing Laboratory, 
Boston, USA. 
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2. Reference panel 2: New Zealand Reference Medical Culture Collection 
strains supplied by Dr Diana Martin, ESR, Porirua, Wellington, New 
Zealand. 

3. MS HI serosubtypes based on sequence heterogeneity; see text for more 
detail. 
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Table 3. Specificity and expected lengths of amplicons of using different 
oligonucleotide primer pairs. 



Primer pairs* 


Specificity 


Length of amplicons (base pairs) 


Sag59/Sagl90" 


GBS (S. agalactiae) 


196 


CFBS/CFBA 


GBS [S. agalactiae) 


241 


16SS/23SA 


GBS (S. agalactiae) 


433 


DSF2/DSR1" 


GBS (S. agalactiae) 


276 


cpsDS/cpsEAl 


serotypes la to VII 


449/458 


cpsES/cpsEA2 


serotypes la to VII 


424 


cpsESl/cpsEA3 


serotypes la to VII 


505 


cpsES2/cpsEFA 


serotypes la to VII 


515 


cpsES3/cpsFA b 


serotypes la to VII 


450 


cpsFS/cpsGAl b 


serotypes la to VII 


423 


cpsES3/cpsGAl b 


serotypes la to VH 


790 


cpsGS/cpsIA 


serotypes la and HI 


1672/1558 


cpsGSl/cpsIA 


serotypes la and m 


1662/1548 


cpsGS/IacpsHAl 


serotype la 


1127 


cpsGSl/IacpsHAl 


serotype la 


1117 


IacpsHS/IacpsHA 


serotype la 


296 


IacpsHS/IacpsHAl 


serotype la 


574 


IacpsHSl/cpsIA c 


serotype la 


354 


cpsGS/IbcpsHAl 


serotype lb 


1468 


cpsGSl/IbcpsHAl 


serotype lb 


1458 


cpsGS/IbcpsIA 


serotype lb 


1660 


cpsGSl/IbcpsIA 


serotype lb 


1650 


IbcpsHS/IbcpsHA 


serotype lb 


282 


IbcpsHS 1/IbcpsHAl 


serotype lb 


349 


IbcpsHS2/IbcpsIA 


serotype lb 


347 


IbcpsIS/IbcpsIAl c 


serotype lb 


523 


cpsGS/IHcpsHA 


serotype HI 


1063 


cpsGSl/HIcpsHA 


serotype HI 


1053 


mVIcpsHS/mcpsHA 


serotype HI 


543 


mcpsHS/cpsIA c 


serotype HI 


641 


cpsGS/IVcpsHA 


serotype IV 


1372 


cpsGSl/IVcpsHA 


serotype IV 


1362 


cpsGS/IVcpsMA 


serotype IV 


1686 


cpsGSl/IVcpsMA 


serotype IV 


1676 
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1 V L» Uoilu/1 V U L/oXXTV. 


ociuLypts iv 


400 

•xUU 


1 V LpollO l/lv UpolVlTl 


otJiuiypts iv 


o / y 




oclOLypc V 




cpsv^o 1/ v cpsrxrV 1 


serotype v 


lUoD 


CpSLrO/ VCpSlVLf\ 


serotype v 


1DOZ 


UpSLxO 1/ VCpSiVLfi 


serotype v 


lo/z 


v cpsrio/ v cpsn/Y 


serotype v 




v cpsrio 1/ v cpsxxrv i 


serotype v 


4U1 


v cpsno vcpsiVLrv 


seroiype v 




ill v icpsno 1/ v icpsrxrv 


seroiype vi 


oyo 


upoVjO/ viupbrxrvi 


beroiype vi 


1 90R 


cpSLrol/ Vicpsrl/Vl 


serotype VI 




cpsGS/VIcpsIA 


serotype VI 


1527 


cpsGSl/VTcpsIA 


serotype VI 


1517 


VIcpsHSAftcpsHAl c 


serotype VI 


327 


VIcpsHSl/VIcpsIA 


serotype VI 


360 



Notes. 

*See Table 2 for primer sequences and Figure 1 for some primer sites. 
Primers used in Algorithm for molecular serotype identification-Figure 2 
a. to identify GBS, b. for sequencing, c. for MS-specific PCR 
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Table 5. Comparison of the results of conventional serotyping (CS) and 
molecular serotype identification (MS)/subtyping of 206 clinical GBS isolates. 



MS/serosubtype 



CS 


la 


lb 


II 


III-l 1 


m-2 1 III-3 1 III-4 1 IV 


V 


VI 


VIII 


la 


38 
















lb 




30 














n 






25 












in 








27 


20 4 3 








IV 










7 








V 












31 






VI 














2 




VIII 
















1 


NT 1 


2 


5 


1 


3 


1 


5 


1 




Total (206) 2 


40 


35 


26 2 


30 


21 2 4 3 7 


36 


3 


1 



Notes. 

1 . For details of MS III serosubtypes see text. 

2. One mixed culture was included as two separate isolates (one serotype II, one subtype 
III-2). 
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Table 7. Specificity and expected lengths of amplicons of using different 
primer pairs. 



Primer pairs* 


Specificity 


Length of 
amnl icons 
(base pairs) 


Protein profile 
code 


TaAaoQRS/ 


oag 






RIeAap(TRS 








IeASl/IeAAl 


has. 


303-591 


B 


GBS1360S/ 


bag 


652 


B 


GBS1937A 








GBS1717S/ 


bag 


292 


B 


GBS1937A 








bcaSl/bcaA 


5'-end of bca 


390 


A 


bcaS2/bcaA 


5'-end of bca 


342 


A 


BcaRUS/bcaRUA 


hen rPTiptitivp unit/ 


235 


a/as 




hen reDetitive unit— 








like reeion 






bcaSl/balA 


alr>2/aln3 


446 


alt)2 or alD3 


bcaS2/balA 


alr>2lalr>3 


398 


alp2 or &lp3 


balS/balA 


alr>2/olr>3 


302 


aln2 or aln3 


bal23Sl/bal2Al 


oJd2 


334 


alp2 


bal23S2/bal2Al 


nln2 


253 


alp 2 


bal23Sl/bal2A2 




426 


alp 2 


bal23S2/bal2A2 


alp2 


345 


alp2 


bal23Sl/bal3A 


alp3 


321 


alp3 


bal23S2/bal3A 


alp3 


240 


alp3 


#ribSl/ribA3 


rib/rib-like 


355 


R/r 


ribS2/ribAl 


rib 


194 


R 


ribS2/ribA2 


rib 


225 


R 


ribS2/ribA3 


rib 


333 


R 



Notes. 

*See Table 6 for primer sequences. 

#For sequencing use only, not entirely specific for rib gene (see text for more 
detail). 
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Table 8. Genetic groups and subgroups of bag gene (C beta protein gene) 
based on amplicon length (using primers IgAagGBS/RIgAagGBS) and sequence 
heterogeneity. 



Group or 

Subgrou 

p 


N= 


Amplicon 
length 


GenBank 
accession 
numbers 


No. of different 

sites compared 

with (cf.) main 

erouD 
o r 


Molecular 

serotype/ 

serosubtypes 


Bl 


19 


532 


X58470 




17 = lb; 2 = II 


Bla 


1 

X 


532 


AF362686 


1 (c f Bll 

1 ^L>.1. XJ 11 


lb 

X VJ 


B2 






AF362687 




lb TT TTT-4 

1U j XX j XXX a 


B3 


2 


58R 


AF3R2R88 




2=Ib 

i-j xu 




1 
X 


Sftfi 


AF^R2fiftQ 


4 fr f R31 


v 

V 


B3h 


1 

X 


JOU 


AF^R2RQ0 


21 fr f RH 


VT 

V X 


B3c 


1 


586 


AF362691 


24 fc f B31 


lb 


B4 

Xf *x 


8 


R04 


AF^R2RQ2 




4 = lb* 4 = IT 




1 

X 




AF^R2RQ^ 

ill JULUv? J 


1 fr f R4l 


n 

XX 


B4b 

-U U 


2 


R04 


AF^R2RQ4 


2 fr f B41 


2 = lb 


B5 

Liu 


9 

£-1 


R22 


AF^R2RQ5 

ill O U U a J 




la VI 

lu, V X 




X 


R22 


AF^R2RQR 


2 fr f RSl 


la 


B6 


1 


640 


AF362697 




lb 


B7 


1 


658 


AF362698 




lb 


B7a 


1 


658 


AF362699 


34 (cf. B7) 


VI 


B8 


1 


712 


AF362700 




lb 


B9 


2 


748 


AF362701 




2 = n 


B9a 


1 


748 


AF362702 


13 (cf. B9) 


lb 


BIO 


2 


820 


AF362703 




2 = lb 


Bll 


1 


838 


AF362704 




lb 



Note. 

*See Table 9 for further details of serotype/serosubtype relationships with 
protein antigens. 
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Table 9. The relationship between GBS protein gene profiles and capsular 
polysaccharide (cps) molecular serotypes/serosubtypes. 



OtJUJiyptJ/ 

COrAcnnfirna * 
sttlfJallDLypt? 


IN — 


None 


A a 


/vau 


K 


alp 

Q 
O 


a 


as 


alp2as 




Ka 


To 
let 


*±o 






z 






OK 

oo 


o 
o 


o 
o 






ID 


07 
O / 




1 


oo 




1 












TT 
It 


zy 




Q 
O 


1U 


o 

o 


z 


5 








1 


TTT 1 
111-1 










30 














TTT n 


ZZ 








22 














TTT-Q 
lll-o 


o 
















5 






III-4 


3 






1 




1 






1 






IV 


9 








1 




8 










V 


38 


1 






1 


35 








1 




VI 


5 




1 


3 






1 










VII 


1 










1 












VIII 


2 


1 








1 












Total 


224 


2 


5 


51 


62 


41 


49 


3 


9 


1 


1 



Note. 

*See text for explanation of cps serosub types and Table 7 for explanation of 
protein antigen gene profile codes. 
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Table 10. Oligonucleotide primers used in this study. 



Primer 



Target Tm°C 5 



GenBank 
accession 
numbers 



Sequence : 



IS861S 

IS861A1 

IS861A2 

IS1548S 
IS1548S1 

IS1548A1 
IS1548A2 
IS1548A3 

IS1381S1 

IS1381S2 



IS861 77 A 

IS861 77.3 

IS861 76.1 

IS1548 76.5 

1S1548 77.0 

IS1548 77.0 

IS1548 70.3 

IS1548 78.0 

IS1381 80.1 

IS1381 81.7 



M22449 

M22449 

M22449 

Y14270 
Y14270 

Y14270 
Y14270 
Y14270 

AF064785/ 
AF367974 

AF064785/ 
AF367974 



445GAG AAA ACA AGA GGG 
AGA CCG AGT AAA ATG GGA 
CG479 

831CAC GAT TTC GCA GTT CTA 
AAT AAA TCC GAC GAT AGC 
C795 

1020CAA ACT CCG TCA CAT 
CGG TAT AGC ACT TCT CAT 
AGG985 

143CTA TTG ATG ATT GCG CAG 
TTG AAT TGG ATA GTC GTC178 

539GTT TGG GAC AGG TAG 
CGG TTG AGG AGA AAA GTA 
ATG574 

574CAT TAC TTT TCT CCT CAA 
CCG CTA CCT GTC CCA AAC539 

915CCC AAT ACC ACG TAA CTT 
ATG CCA TTT G888 

930CGT GTT ACG AGT CAT CCC 
AAT ACC ACG TAA CTT ATG 
CC893 

272/818CTT ATG AAC AAA TTG 
CGG CTG ATT TTG GCA TTC 
ACG307/853 

497/1040GGC TCA GGC GAT 
TGT CAC AAG CCA AGG 
GAG526/1069 
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IS1381A 



ISSa4S 



ISSa4Al 



ISSa4A2 



GBSilSl 



GBSilS2 



GBSilAl 



GBSilA2 



IS1381 73.1 



ISSa4 78.5 



ISSa4 75.2 



ISSa4 74.5 



GBSil 78.6 



GBSil 77.3 



GBSil 83.9 



GBSil 80.5 



AF064785/ 881/1424CTA AAA TCC TAG 

AF36 79 74 TTC ACG GTT GAT CAT TCC 
AGC849/1392 

AF165983 326CGT ATC TGT CAC TTA TTT 
CCC TGC GGG TGT CTC C359 

AF165983 639GCC GAT GTC ACA ACA 
TAG TTC AGG ATA TAG CCA 
G606 

AF165983 780CGT AAA GGA GTC CAA 
AGA TGA TAG CCT TTT TGA 
ACC745 

AJ292930 721CAT CTC GGA ACA ATA TGC 
TCG AAG CTT ACA AGC AAG 
TG758 

AJ292930 789GGG GTC ACT ATC GAG 
CAG ATG GAT GAC TAT CTT 
CAC824 

AJ292930 1058AAT GGC TGT TTC GCA 
GGA GCG ATT GGG TCT GAA 
CC1024 

AJ292930 1161CCA GGG ACA TCA ATC 
TGT CTT GCG GAA CAG TAT 
CG1127 



Notes. 

1. The primer Tm values were provided by the primer synthesiser (Sigma- 
Aldrich). 

2. Numbers represent the numbered base positions at which primer sequences 
start and finish (numbering start point "1" refers to the start point "1" of 
corresponding gene GenBank accession number). 
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Table 11. Specificity and expected lengths of amplicons of using different 
oligonucleotide primer pairs. 



Primer pairs* 


Specificity 


Length of amplicons (base 
pairs) 


IS861S/IS861A1 


IS861 


387 


IS861SAS861A2 


IS861 


576 


IS1548S/IS1548A1 


IS1548 


432 


IS1548S/IS1548A2 


1S1548 


773 


IS1548S/IS1548A3 


IS1548 


788 


IS1548S1/IS1548A2 


IS1548 


377 


IS1548S1/IS1548A3 


IS1548 


392 


IS1381S1/IS1381A 


IS1381 


610/60 7# 


IS1381S2/IS1381A 


IS1381 


385 






314 


ISSa4S/ISSa4A2 


ISSa4 


455 


GBSilSl/GBSilAl 


GBSil 


338 


GBSilSl/GBSilA2 


GBSil 


441 


GBSilS2/GBSilAl 


GBSil 


270 


GBSilS2/GBSilA2 


GBSil 


373 



Notes. 

*See table 10 for primer sequences. 

# Our sequencing result (GenBank accession number: AF367974) was 3 bp 
shorter than that previously described by Tamura et al., 2000 (GenBank 
accession number: AF064785). 
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Table 12. Relationship between mobile genetic elements and capsular 
polysaccharide serotypes, serotype III subtypes and surface protein gene 
profiles. 
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Tfl 
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TTT-zl 
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SUDZOlCU 




Of/ 


ro 
DO 


all 




•f 

I 


Off 


5 


IV 


R 


1 


1 




1 




1 




IV 


a 


8 


2 




8 








subtotal 




9 


3 




9 




1 




V 


alp3 


35 


3 


1 


35 


1 


1 




V 


R 


1 


1 




1 


1 






V 


RB 


1 


1 




1 








V 


none 


1 












1 


subtotal 




38 


5 


1 


37 


1 


1 


2 
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VI 


Aa 


X 




1 




AaB 


3 


3 


3 




a 


1 
l 




1 








Q 

o 


o 


VII 


a1ri3 


X 




1 


VIII 


CM uo 


X 




1 




none 


1 




1 


subtotal 




2 




2 


Total 




224 


124 41 (18) 


190 15^7; 43fl9J 10 (4j 








(55) 


(85) 



Note. 

A: 5'-end of bca gene (C alpha protein); 

a: bca gene repetitive unit or bca gene repetitive unit-like sequence (multiple 
band amplicon); 

as: bca gene repetitive unit or bca gene repetitive unit-like sequence (single 
band amplicon); 

B: C beta/IgA binding protein {bag) gene. 
R: Rib protein {rib) gene; 
alp2: C alpha-like protein 2 {alp2) gene; 
alp3: C alpha-like protein 3 {alp3) gene; 
r: assumed Rib-like protein gene. 



Figure 1. Multiple sequence alignments of the regions of the 3' end of cpsD-cpsE-cpsF-and 
the 5' end of cpsG for reference strains of serotypes la to VTL 

1 50 
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Serotype VII 

Serosubtype III-3 

Serosubtype Ia-1 

Serosubtype III- 1 

Serotype IV 
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Serosubtype Ia-2 — 

Consensus GCAAAAGAAC A6AT GGAACA AAGTGG TTCA AAGTTCTTAG GTATTATTCT 

cpsDS 

51 100 

Serosubtype III-2 
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Serotype II/III-4 -g 

Serotype VII -g 

Serosubtype III-3 

Serosubtype Ia-1 -g 

Serosubtype III-l 
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Serosubtype Ia-2 a- 

Consensus GTTCATTATT TTTTATTTTT AAAAACTCTT TT ACAACGAC ACGACTTTCC 
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501 550 

Serosubtype III-2 

Serotype VI 

Serotype lb 

Serotype II/III-4 

Serotype VII 

Serosubtype III-3 

Serosubtype Ia-1 

Serosubtype III-l 

Serotype IV 

Serotype V 

Serosubtype Ia-2 

Consensus ATTTTTAAAA TAT TAT C G AA AATATTCTTA CGCTAAGTTT TCACGAGATA 

551 600 

Serosubtype III-2 

Serotype VI 

Serotype lb 

Serotype II/III-4 ■ — 

Serotype VII 

Serosubtype III-3 

Serosubtype Ia-1 

Serosubtype III-l 

Serotype IV 



3 



Serotype V 
Serosubtype Ia-2 
Consensus 



Serosubtype III-2 
Serotype VI 
Serotype lb 
Serotype II/III-4 
Serotype VII 
Serosubtype III-3 
Serosubtype Ia-1 
Serosubtype III-l 
Serotype IV 
Serotype V 
Serosubtype Ia-2 
Consensus 



Serosubtype I I 1-2 
Serotype VI 
Serotype lb 
Serotype II/III-4 
Serotype VII 
Serosubtype III-3 
Serosubtype Ia-1 
Serosubtype III-l 
Serotype IV 
Serotype V 
Serosubtype Ia-2 
Consensus 



Serosubtype III-2 
Serotype VI 
Serotype lb 
Serotype II/III-4 
Serotype VII 
Serosubtype III-3 
Serosubtype Ia-1 
Serosubtype III-l 
Serotype IV 
Serotype V 
Serosubtype Ia-2 
Consensus 



Serosubtype III-2 
Serotype VI 
Serotype lb 
Serotype II/III-4 
Serotype VII 
Serosubtype III-3 
Serosubtype Ia-1 
Serosubtype III-l 
Serotype IV 
Serotype V 



CCAAAGTTGT TTTGATAACG AATAAGGATT CTTTATCAAA AATGACCTTT 
601 650 



c 

1 

-a 

-a 



1 c 

c 

1 

: 1 

— t C 

AGGAATAAAT ACGACCATAA TTATATCGCT GTCTGTAT CT TGGACTCCTC 

651 700 



TGAAAAGGAT TG TTATGATT TGAAACATAA CTCGTTAAGG ATAATAAACA 
cpsESl 

701 750 



K 

AAGATGCTCT TACTTCAGAG TTA ACCTGCT TAACTGTTGA TCAAGCTTTT 

cpsEA2 

751 800 



4 



Serosubtype Ia-2 
Consensus 



AT T AAC AT AC CCATTGAATT ATTTGGTAAA TACCAAATAC AAGATATTAT 



801 850 

Serosubtype III-2 

Serotype VI — t 

Serotype lb 

Serotype II/III-4 

Serotype VII 

Serosubtype III-3 

Serosubtype Ia-1 

Serosubtype III- 1 

Serotype IV 

Serotype V 

Serosubtype Ia-2 

Consensus TAATGACATT GAAGCAATGG GAGTGATTGT CAAT GTTAAT GTAGAGGCAC 



851 900 

Serosubtype III-2 

Serotype VI 

Serotype lb 

Serotype II/III-4 

Serotype VII 

Serosubtype II 1-3 

Serosubtype Ia-1 

Serosubtype III-l 

Serotype IV 

Serotype V 

Serosubtype Ia-2 

Consensus TTAGCTTTGA TAATATAGGA GAAAAGCGAA TCCAAACTTT TGAAGGATAT 

901 950 

Serosubtype III-2 

Serotype VI 

Serotype lb 

Serotype II/III-4 

Serotype VII 

Serosubtype III-3 

Serosubtype Ia-1 

Serosubtype III-l 

Serotype IV 

Serotype V 

Serosubtype Ia-2 

Consensus AGTGTTATTA CATATTCTAT GAAATTCTAT AAATATAGTC ACCTTATAGC 



951 1000 

Serosubtype III-2 

Serotype VI t 

Serotype lb t 

Serotype II/III-4 t 

Serotype VII t 

Serosubtype III-3 

Serosubtype Ia-1 '- 

Serosubtype III-l 



Serotype IV 

Serotype V 

Serosubtype Ia-2 

Consensus AAAACGATTT TTGGATATCA CGGGTGCTAT TATAGGTTTG CTCATATGTG 

1001 1050 

Serosubtype III-2 

Serotype VI c 

Serotype lb 

Serotype II/III-4 

Serotype VII 

Serosubtype III-3 a 

Serosubtype Ia-1 a 

Serosubtype III-l 

Serotype IV a 

Serotype V a 

Serosubtype Ia-2 a 

Consensus GCATTGTGGC AATTTTTCTA GTTCCGCAAA T C AG AAA AGA TGGTGGACCG 

1051 1100 

Serosubtype III-2 

Serotype VI 

Serotype lb 

Serotype II/III-4 

Serotype VII 

Serosubtype III-3 

Serosubtype Ia-1 

Serosubtype III-l 

Serotype IV 

Serotype V 

Serosubtype Ia-2 

Consensus GCTATCTTTT CTCA AAATAG AGTAGGTCGT AATGGTAGGA TTTTTAGATT 
cpsES2 

1101 1150 

Serosubtype III-2 

Serotype VI 

Serotype lb 

Serotype II/III-4 

Serotype VII 

Serosubtype III-3 

Serosubtype Ia-1 

Serosubtype III-l 

Serotype IV 

Serotype V 

Serosubtype Ia-2 

Consensus CTATAAATTC AGATCAAT GC GAGTAGATGC AGAACAAATT AAGA AAGATT 

cpsEA3 

1151 * 1200 

Serosubtype III-2 a 

Serotype VI — -a 

Serotype lb --g 

Serotype II/III-4 

Serotype VII 

Serosubtype III-3 a 

Serosubtype Ia-1 

Serosubtype III-l a 

Serotype IV a 
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Serotype V 
Serosubtype Ia-2 
Consensus 



Serosubtype III-2 
Serotype VI 
Serotype lb 
Serotype II/III-4 
Serotype VII 
Serosubtype III-3 
Serosubtype Ia-1 
Serosubtype III-l 
Serotype IV 
Serotype V 
Serosubtype Ia-2 
Consensus 



Serosubtype III-2 
Serotype VI 
Serotype lb 
Serotype II/III-4 
Serotype VII 
Serosubtype III-3 
Serosubtype Ia-1 
Serosubtype III-l 
Serotype IV 
Serotype V 
Serosubtype Ia-2 
Consensus 



Serosubtype III-2 
Serotype VI 
Serotype lb 
Serotype II/III-4 
Serotype VII 
Serosubtype III-3 
Serosubtype Ia-1 
Serosubtype III-l 
Serotype IV 
Serotype V 
Serosubtype Ia-2 
Consensus 



Serosubtype III-2 
Serotype VI 
Serotype lb 
Serotype II/III-4 
Serotype VII 
Serosubtype III-3 
Serosubtype Ia-1 
Serosubtype III-l 



TATTAGTTCA CAATCAAATG ACAGGGCTAA TGTTTAAGTT AGACGATGAT 
1201 1250 



CCTAGAATTA CTAAAATAGG AAAATTTATT CGAAAAACAA GCATAGATGA 
1251 1300 



g 

GTTGCCTCAA TTCTATAATG TTTTAAAAGG T GAT AT GAGT TTAGTAGGAA 

1301 1350 



CACGCCCTCC CACAGT T GAT GAAT AT GAAA AGTATAATTC AAC GCAGAAG 



1351 1400 
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Serotype IV 

Serotype V 

Serosubtype Ia-2 

Consensus CGACGCCTTA GTTTTAAGCC AGGAATCACT GGTTTGTGGC AAATATCTGG 

1401 1450 

Serosubtype III-2 

Serotype VI 

Serotype lb 

Serotype II/III-4 

Serotype VII 

Serosubtype III-3 — c 

Serosubtype Ia-1 — c 

Serosubtype III-l 

Serotype IV 

Serotype V 

Serosubtype Ia-2 — c 

Consensus TAGAAATAAT ATTACTGATT TTGATGAAAT CGTAA AGTTA GATGTTCAAT 

1451 1500 

Serosubtype III-2 

Serotype VI a 

Serotype lb g 

Serotype II/III-4 

Serotype VII 

Serosubtype III-3 

Serosubtype Ia-1 

Serosubtype III-l 

Serotype IV 

Serotype V 

Serosubtype Ia-2 

Consensus ATATCAATGA ATGGTCTATT TGGTCAGA TA TTAAGAT TAT TCTCCTAACA 

cpsES3 

1501 1550 

Serosubtype III-2 -t c — 

Serotype VI -t c — 

Serotype lb -t c — 

Serotype II/III-4 t 

Serotype VII t 

Serosubtype III-3 1 

Serosubtype Ia-1 1 

Serosubtype III-l -t c — 

Serotype IV 1 

Serotype V -t c — 

Serosubtype Ia-2 1 

Consensus CTAAAGGTAG TCTTACTTGG GACAGG AGCT AAGTAAAGGT AAGGTTTGAA 

cpsE | cpsEFA 
1551 1600 

Serosubtype III-2 

Serotype VI c 

Serotype lb c 

Serotype II/III-4 

Serotype VII 

Serosubtype II 1-3 

Serosubtype Ia-1 

Serosubtype III-l 

Serotype IV 
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Serotype V 

Serosubtype Ia-2 

Consensus AGGAATATAA TGAAAATTTG TCTGGTTGGT TCAAGTGGTG GTCATCTAGC 

| cpsF 

1601 1650 

Serosubtype III-2 

Serotype VI 

Serotype lb 

Serotype II/III-4 

Serotype VII t 1 

Serosubtype III-3 

Serosubtype Ia-1 

Serosubtype III-l 

Serotype IV 1 

Serotype V 

Serosubtype Ia-2 

Consensus ACACTT GAAC CTTTTGAAAC CCATTTGGGA AAAAGAAGAT AGGTTTTGGG 

1651 1700 

Serosubtype III-2 

Serotype VI 

Serotype lb 1 

Serotype II/III-4 

Serotype VII 

Serosubtype III-3 

Serosubtype Ia-1 

Serosubtype III- 1 

Serotype IV 

Serotype V 

Serosubtype Ia-2 

Consensus TAACCTTTGA TAAAGAAGAT GCTAGGAGTA TTCTAAGAGA AGAGATTGTA 

1701 1750 

Serosubtype III-2 

Serotype VI 

Serotype lb 

Serotype II/III-4 ■ 

Serotype VII 

Serosubtype III-3 

Serosubtype Ia-1 

Serosubtype III-l 

Serotype IV 

Serotype V 

Serosubtype Ia-2 

Consensus TAT CAT T GCT TCTTTCCAAC AAACCGTAAT GTCAAAAACT TGGTAAAAAA 

1751 1800 

Serosubtype III-2 

Serotype VI 

Serotype lb 

Serotype II/III-4 

Serotype VII 

Serosubtype III-3 

Serosubtype Ia-1 
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Serosubtype III-l 

Serotype IV 

Serotype V 

Serosubtype Ia-2 

Consensus TACTATTCTA GCTTTTAAGG TCCTTAGAAA AGAAAGACCA GAT GT TAT C A 

1801 1850 

Serosubtype III-2 • 

Serotype VI 

Serotype lb 

Serotype II/III-4 

Serotype VII 

Serosubtype III-3 

Serosubtype Ia-1 

Serosubtype III-l 

Serotype IV -t 

Serotype V 

Serosubtype Ia-2 

Consensus TAT CAT CT GG TGCCGCTGTA GCAGTACCAT T CTTTTATAT TGGTAAGTTA 

cpsFS 

1851 1900 

Serosubtype III-2 

Serotype VI 

Serotype lb c 

Serotype II/III-4 

Serotype VII a 

Serosubtype III-3 

Serosubtype Ia-1 

Serosubtype III-l 

Serotype IV 

Serotype V c -g 

Serosubtype Ia-2 

Consensus TTTGGTTGTA AGACCGTTTA TAT AGAGGT T TTCGACA GGA TAGATAAACC 

cpsFA 

1901 1950 

Serosubtype III-2 

Serotype VI 

Serotype lb 

Serotype II/III-4 

Serotype VII 

Serosubtype III-3 

Serosubtype Ia-1 

Serosubtype III-l 

Serotype IV 

Serotype V 

Serosubtype Ia-2 

Consensus AACTTTGACA GGAAAATTAG TGTATCCTGT AACAGATAAA TTTATTGTTC 

1951 2000 

Serosubtype III-2 

Serotype VI a 

Serotype lb 

Serotype II/III-4 

Serotype VII 

Serosubtype 1 1 1-3 

Serosubtype Ia-1 

Serosubtype III-l 
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Serotype IV 

Serotype V 

Serosubtype Ia-2 

Consensus AGTGGGAAGA AATGAAAAAA GTTTATCCTA AG GCAATTAA TTTAGGAGGA 

2001 2050 

Serosubtype III-2 

Serotype VI 

Serotype lb a 

Serotype II/III-4 

Serotype VII 

Serosubtype III-3 

Serosubtype Ia-1 

Serosubtype III-l 

Serotype IV 

Serotype V 

Serosubtype Ia-2 

Consensus ATTTTTTAAT GATTTTTGTC ACAGTGGGGA CACATGAACA GCAGTTCAAC 
cpsF | cpsG 

2051 2100 

Serosubtype III-2 

Serotype VI 

Serotype lb 

Serotype II/III-4 

Serotype VII 

Serosubtype III-3 

Serosubtype Ia-1 • 

Serosubtype III-l 

Serotype IV a-- 

Serotype V 

Serosubtype Ia-2 

Consensus CGTCTTATTA AAGAAGTTGA TAGATTAAAA GGGACAGGTG CTATTGATCA 

2101 2150 

Serosubtype III-2 c 

Serotype VI 

Serotype lb 

Serotype II/III-4 

Serotype VII 

Serosubtype III-3 

Serosubtype Ia-1 

Serosubtype III-l c 

Serotype IV 

Serotype V 

Serosubtype Ia-2 

Consensus AGAAGTGTTC ATTCAAACGG GTTACTCAGA CTTT GAACCT CAGAATTGTC 

cpsGS 

2151 2200 

Serosubtype III-2 

Serotype VI 

Serotype lb 

Serotype II/III-4 

Serotype VII g g 

Serosubtype III-3 

Serosubtype Ia-1 

Serosubtype III-l 
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Figure 3. Multiple sequence alignments of the gene sequences of the cpsG-cpsH- 
cpsI/M for serotypes la, lb, II, m, IV, V and VI (start and stop codons were 
highlighted). 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



1 50 

ATGATTTTTG TCACAGTGGG G AC AC AT GAA CAGCAGTTCA ACCGTCTTAT 

ATGATTTTTG TCACAGTGGG GAC AC AT GAA CAGCAGTTCA ACCGTCTTAT 

ATGATTTTTG TCACAGTGGG GAC AC AT GAA CAGCAGTTCA ACCGTCTTAT 

ATGATTTTTG TCACAGTAGG GACAC AT GAA CAGCAGTTCA ACCGTCTTAT 

ATGATTTTTG TCACAGTGGG GACAC AT GAA CAGCAGTTCA ACCGTCTTAT 

ATGATTTTTG TCACAGTGGG GAC AC AT GAA CAGCAGTTCA ACCGTCTTAT 

★ ★★**★■*■★★★ *******_** ********** ********** ********** 

cpsG 

51 100 

TAAAGAAGTT GATAGATTAA AAGGGACAGA T GCTAT T GAT CAAGAAGTGT 

TAAAGAAGTT GATAGATTAA AAGGGACAGG TGCTATT GAT CAAGAAGTGT 

TAAAGAAGTT GATAGATTAA AAGGGACAGG TGCTATT GAT CAAGAAGTGT 

TAAAGAAGTT GATAGATTAA AAGGGACAGG TGCTATTGAT CAAGAAGTGT 

TAAAGAAGTT GATAGATTAA AAGGGACAGG TGCTATTGAT CAAGAAGT GT 

TAAAGAAGTT GATAGATTAA AAGGGACAGG TGCTATTGAT CAAGAAGTGT 

* + + ★ -A.**.*.*.*****. ***** ** **_ ********** ********** 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



101 

T CAT T C AAAC 
TCATTCAAAC 
TCATTCAAAC 
TCATTCAAAC 
TCATTCAAAC 
TCATTCAAAC 



GGGTTACTCA 
GGGTTACTCA 
GGGTTACTCA 
GGGTTACTCA 
GGGTTACTCA 
GGGTTACTCA 



GACTTTGAAC 
GACTTTGAAC 
GACTTTGAAC 
GACTTTGAAC 
GACTTCGAAC 
GACTTTGAAC 



CTCAGAATTG 
CTCAGAATTG 
CTCAGAATTG 
CTCAGAATTG 
CTCAGAATTG 
CTCAGAATTG 



150 

TCAGTGGTCA 
TCAGTGGTCA 
TCAGTGGTCA 
TCAGTGGTCA 
TCAGTGGTCA 
TCAGTGGTCA 



+ + + + *********** •*■ + ■*•■*- + _-*•★★★ + + 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



151 

AAATTTCTCT 

AAATTTCTCT 

AAATTTCTCT 

AAATTTCTCT 

AAATTTCTCT 

AAATTTCTCT 
■*•* + + + ★*★■*■★ 



CAT AT GAT G A 
CAT AT GAT G A 
CAT AT GAT GA 
CAT AT GAT GA 
CAT AT GAT GA 
CAT AT GAT G A 



TATGAACTCT 
TATGAACTCT 
TATGAACTCT 
TATGAACTCT 
TATGAACTCT 
TATGAACTCT 



T AC AT GAAAG 

T AC AT GAAAG 

T ACAT GAAAG 

T AC AT GAAAG 

T ACAT GAAAG 

T AC AT GAAAG 
★★★★++★+** 



200 

AAGCT GAGAT 
AAGCT GAGAT 
AAGCT GAGAT 
AAGCT GAGAT 
AAGCT GAGAT 
AAGCT GAGAT 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



201 

TGTTATCACA 
TGTTATCACA 
TGTTATCACA 
TGTTATCACA 
TGTTATCACA 
TGTTATCACA 



CATGGCGGTC 
CATGGCGGCC 
CATGGCGGTC 
CACGGCGGTC 
CATGGCGGCC 
CATGGCGGCC 



CAGCGACGTT 
CAGCGACGTT 
CAGCGACGTT 
CAGCAACGTT 
CAGCGACGTT 
CAGCGACGTT 



TATGAATGCA 
TATGAATGCA 
TAT GAAT GCA 
TATGAATGCA 
TAT GT CAGTT 
TAT GT CAGTT 



★ **•*•■*•*•*• -A- * * + + _+* + ★★_ 



250 

GTTTCTAAAG 

GTTTCTAAAG 

GTTTCTAAAG 

GTTTCTAAAG 

ATTTCTTTAG 

ATTTCTTTAG 
_*•***•* ★ + 



1 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



251 

GGAAAAAAAC 
GAAAAAAAAC 
GGAAAAAAAC 
GGAAAAAAAC 
GGAAATT AC C 
GGAAATT AC C 



TATTGTGGTT 
TATTGTGGTT 
TATTGTGGTT 
TATTGTGGTT 
AGTTGTTGTT 
AGTTGTTGTT 



CCTAGACAAG 
CCTAGACAAG 
CCTAGACAAG 
CCTAGACAAG 
CCTAGGAGAA 
CCCAGGAGAA 



AACAGTTTGG 
AACAGTTTGG 
AACAGTTTGG 
AACAGTTTGG 
AGCAGTTTGG 
AGCAGTTTGG 



300 

AGAGCATGTG 
AGAGCATGTG 
AGAGCATGTG 
AGAGCATGTG 
T GAAC AT AT C 
T G AAC AT AT C 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



301 

AAT AAT CAT C 
AAT AAT CAT C 
AAT AAT CAT C 
AAT AAT CAT C 
AAT GAT CAT C 
AAT GAT CAT C 



AGGTGGATTT 
AGGTGGACTT 
AGGTGGATTT 
AGGTGGATTT 
AAATACAATT 
AAATACAATT 
* * 



TGTTAATAAG 
TGTTAATAAG 
TTT GAAAGAG 
TTTGAAAGAG 
TTTAAAAAAA 
TTTAAATTCG 



GTAAAAACAA 
GTAAAAACAA 
TTATTCTTGA 
TTATTCTTGA 
ATTGCCCACC 
ATTGCCCACC 



350 

TGTATAATTT 
TGTATAATTT 
AAATT GAATT 
AAT AT GAGTT 
TGTATCCCTT 
TGTATCCCTT 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



351 

TGATATCGTT 
TGATATCGTT 
AGATTATATT 
AGATTATATT 
GGCTTGGATT 
GGCTTGGATT 



GTAGATATTG 
GTAGATATTG 
TTGAATATCA 
T T G AAT AT C A 
GAAGAT GT AG 
GAAGATGTAG 



AAAGGTTACA 
AAAGGTTACA 
GTGAATTAGA 
GT GAATTAGA 
ATGGACTTGC 

ATGGACTTGC 
★ 



AAATGTAGTC 
AAATGTAGTC 
GAATATTATT 
GAATATTATT 
GGAAGCGTT . 
GGAAGCGTT • 



400 

TATGAGGGGA 
TAT GAGGGAA 
AAGGAAAAAA 
AAGGAAAAAA 
. . GAAAAGGA 
. . GAAAAGGA 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



401 

C GAT G AAT C G 

TGATGAATCG 

AT AT AT CT AC 

AT AT AT CT AC 

ATATAGCTAC 

ATATAGCTAC 
+ 



TCCGTTTTTA 
TCCGTTTTTA 
TAGTAAAGTA 
TAGTAAAGTA 
AGAAAAATAT 
AGAAAAATAT 



GAAACTAACA 

GAAACTAATA 

AT AT C AC AAA 

AT AT CACAAA 

CAGGGAAATA 

CAGGGAAATA 
+ 



450 

GAAGTAATTT TATT 

GTAGTAATTT TATT 

ACAATGATTT TTGTTTCTCT 
ACAAT GATTT TTGTTCCTCT 

AT GAT AT GT T TTGT 

AT GAT AT GTT TTGT 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



451 500 

GAAGAA TTTAAGGTAA TATTAAAGGA 

GAAGAA TTTAAGGTAA TATTAAAGGA 

TTCAAAAATG AACATTTCAT AAACTATTTG AATAAATATA TTTTGTTGGA 
TTCAAAAATG AAC . . TTTCT AAACTATTTG AATAAATATA TTTTGTTGGA 

CATA AATTAGAAAA AATTATAGGT 

CATA AATTAGAAAA AATTATAGGT 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



501 

GTTGTGTGAT 
GTTGTGCGAT 
GAAAAAAATT 
GAAAAAAATT 
GAAATATGAG 
GAAATATGAG 



GAAA AT C AAT AAA AACTCTTTAT 

GAAA AT C AAT AAA AACTCTTTAT 

GAAATTAACA TATCAATCCA AAGTATTTGT 
GAAATTAAC. TATCAATCCA AAGTATTTGT 
GAAAT. . . .A TCTAGATTTA GATT ATT CT T 
GAAAT A TCTAGATTTA GATTATTCTT 



550 

TTT AT ATT GC 
TTT AT ATT GC 
TAATAGGAGG 
TAATAGGAGG 
TAT TTT AT GC 
TATTTTAT GC 



k- * ** * _★_ + _** ★ ★ ★_ 



2 



551 600 

AATATTTTTA GTTAATTTTT TTAAAT CACT AGGTTTAGGA GAGGGGAACT 

AATATTTTTA GTTAATTTTT TTAAAT CACT GGGTTTAGGC GAGGGAAACT 

AATTTTCGCT TTAACCCTAT TTTCAAAGCC AATGCAACTT TTGTTACTTT 

AATTTTCGCT TTAACCCTAT TTTCAAAGCC AATGCAACTT TTGTTACTTT 

TCTTTGGGTA CTTATTTTAG TACCAAACCA AT GGTAT CAG TTTTTAATTA 

TCTTTGGGTA CTTATTTTAG TACCAAACCA AT GGTAT CAG TTTTTAATTA 

— ★ — + ★ ★ ★ 

cpsH 

601 * 650 

CAACTTACAA AATAGT GAT G TTTGTTGCAA TCTTCTTGTG TGGAATAAAA 

CAGCTTACAA AATAGT GAT G TTAGTTGCAA TTTTACTGTG TGGAATAAAA 

TAGCATTAAT AGTTTTACTT ATTTGTAGTA GTTATAAGAA AAAAATGAAA 

TAGCATTAAT AGTTTTACTT ATTTGTAGTA GTTATAATGA AAAAATGAAA 

TTACCATTAT AGTTCTATTA TTACTTTGGA AGAGT GAGTT TAGAAT . . .A 

TT AC CATTAT AGTTCTATTA TTACTTTGGA AGAGT GAGTT TAGAAT... A 
★ ★ _* * ★ * 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



651 

TTTTTA. . . . 

TTTTTA. . . . 

TTTTTATATA 

TTTTTAAATA 

TCTATAAGCA 

TCTATAAGCA 



. . TTAGATAG 
. .TTAGATAG 
TGGCTGAAAT 
TGGCTGAAAT 
ATTCTTCAAT 
ATTCTTCAAT 



CCTTTATTTT 
CCTTTATTTT 
TTTTTTCATT 
TTTTTTCATT 
ACTATTTCTG 
ACT ATT T CT G 



GAAAGAAGAA ■ 

GAAAGAAGAA 

GTATTTTATA 

GTATTTTATA 

CTTTGGTTAT 

CTTTGGTTAT 



700 

AACTCGTTAT 
AACTCGTGAT 
T CAT T TATTT 
TGGTTTATTT 
TTATTTATTT 
TTATTTATTT 



800 

TTTAATTTTT 
TTTAATTTTT 
AGTGGCTTTG 
CTTATTATTT 
TTTATTTTTT 
TTTATTTTTT 



701 750 
CATCTTTTTA TTATTTATTG CGACCATTTT GAATTTATTC TTTGTTCATA 
CATCTTTTTA TTATTTATCG CGACCATTTT GAATTTATTC TTTGTTCATA 
AACTTCAATA TTGCTACATT CTTTGTTTAA AACTCCTGAT TTT GATAGAA 
AGTATCAATA GT AT TAAATT CGTTATTTAG AAGTCCAGAA TTTCATAGAG 
ATTT GCAATA CT CAT T AGAG GTACT CAAGA GGATATAACG TTTCAGCGAT 
ATT T GCAAT A CT CAT TAGAG GTACT CAAGA GGATATAACG TTTCAGCGAT 
★* ★ ★ + + + 

751 

AGGTTACTTT TATATTAA C 

AGGTTACTTT TATATTAA C 

TTTTAGCAGC TTTTAACTCG TTGATTATCG GTATAGTATC 
TCATTGCTGC ATTCAATTCA CT GGCAGT AG GGGTTGTGTC 
TTATTGCTGA GCTATTAAAA CTAATTAGTA CAGGATATGC 

TTATTGCTGA GCTATTAAAA CTAATTAGTA CAGGATATGC 

★ + ★ ★ _ 



801 850 
TTTCTAGCAT TAAAGGATAT CTCTCTAAAA AAAGCTTTCT CTATAATAAT 
TTTCTAGCAT TAAAGGACAT CTCTCTAAAA AAAGCTTTCT CTATAATAAT 
AAACGGTGGT ATAAGAATAC AACTTT GGAG TTAGATAAAA TATTAAAAGC 
TACCATTACT ATAAGAATAC TAATATTGAA TTAACAAAAT TGCTAAAATC 
TATAATTATT ATAGAAAAGC TGATTTTAAT AGTTCAGTTG TAAGGAAT GT 
TATAATTATT ATAGAAAAGC TGATTTTAAT AGTTCAGTTG TAAGGAAT GT 
★ * ★ * 



3 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



851 

AGGATCGCGT 
AGGATCGCGT 
ATTTTTATTT 
ATTTTTGTTT 
GGTAAAGGTT 
GGTAAAGGTT 



ATTTTGGGAG 
ATTTTGGGAG 
AATGGGTTAA 
AAT GCAATTA 
AACTATTTTG 
AACTATTTTG 



TTCTATTAAA 

TTCTATTAAA 

TCCTATTTTT 

TTTTGTTTTG 

TGTTGTTTCT 

TGTTGTTTCT 
•*■ + + 



TCAAATTTTT 
TCAAATTTTT 
TTTAGGGGGA 
TTTAGGATTT 
TATAACAGTT 
TATAACAGTT 



900 

GTGAAATTAG 
GTGAAATTAG 
ACATATTATT 
CTAT AT TAT T 
TTATATT . . . 
TTATATT . . . 



901 950 

ATTTAATAGA AATTAAATAT ATCAATTTTT ATAGGGATGG ACAATTTATT 

ATTTAATAGA AATTAAGTAT GTCAATTTTT ATAGGGATGG ACAATTTATT 

ATTGTTTGCA T AAT AAT AT T C AAAAT AT C A GTATTTTTGG TAGAGATTT G 

ATGCCATATA TTTTGATGTA GAGAAT GTAA GTCTTTTTGG AAGAAATTTA 

TATT TTTTCCTATG CTGAAGCCAA CTTTATTTGG AAGAGAATTG 

TATT TTTTCCAAAT GAATTTACTA CATTCCTAGG AAGAGAT TT A 



★ ■*■ ★ * _ 

951 1000 

CTGAGAAGTG ACT TAGG 

CTGAGAAGTG ACT TAGG 



ATTGGGTCAG ACT GGATTAA TGGTATGCAT ACTCAAAGAG CAATGGGATT 
ATTGGATCAG ATTGGATAAA TGGGATGCAT AC GCAGAGAG CAATGGCTTT 
TTTTCAATAG AGTGGTTTCC AC AT AT G . . . AGAATAAGAC TTGCGGCATA 
TTTTCAATTG AATGGATTCC TTCTATG . . . AAAGTTAGAC TTACT GCAT A 
★ 



1001 

TTTTGGTCAT 

TTTTGGTCAT 

TTTTGAATAT 

CTTT GAATAT 

TTTTGAATAT 

TTTTGAGTAT 
+ + ★ 



CCTAACTTTA 
CCTAACTTTA 
TCAAACCTTA 
TCAAATCTTA 
GCTACACTAA 
GCAACACTAT 



TTCATAATTT 
T T CAT AAT T T 
TAATTCCTAT 
TAATACCCTT 
TTGGTCAGTT 
TAGGT CAGTT 



TTTTGCAGTA 
TTTTGCTCTA 
GACAGT GGTA 
AACTATCATA 
TATTTTATTT 
TATTTTATTC 



1050 
ACTGTTTTTT 
ACTATTTTCT 
ACTAACTATA 
ACTAA . TATA 
TCTTATCCCA 
ACTTATCCGA 



1051 1100 
TAT AT GT AAC ACTTTTTTAT AGAAAACTAA GAT . TAATAA CTATTGCTTT 
TGTATATTGT ACTCAATTAT AAAC GACTAA AGC . CTGTTG TGATGGTTTT 
TATATATATA T.TATATGAA GTTAAGAAAC TATTCAATTA TGACCATAGG 
TATATATATA TATATATTAA GCAAAGATAT AGCTCAGGGA T GAT GAT ACT 



TAC TTTTTTTGAA AC C CCAAAAA CAT AT GGAAA ATATTTTAAT 

TAT TATTTTTAAA ACAGCAGAGG TAT G GAGAAA ATATTTTTAT 

*• + ★ ★ 

1101 1150 



TATTTTAACT CTAAATTACT TCTTGTATCA GTATACTTAT TCAAGAACTG 
ATTTTTAACA TTAAATTATT TATT GT AC CA ATATACTTTT T CAAGGACAG 
TGTTGTATTA TTATTTACCT TTATTTTACC TATTGGATCG GGCTCCAGGG 
CGGTGCTCTT CTCTCCACTA TTATACTACC CATCGGGTCT GGATCTAGAG 
AT C CTT ACT G TT GACTATAT GTTCATACTT TTCTGGCGCT AGAATACTAT 
CACACTATTC CTAGTTTTTT GTGCATATTT GACAGGGGCA AGAATTTTCC 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



1151 

GAT AT TAT AT 
GGTATTATAT 
CTGGAATAGT 
CTGGTATTAT 
TGGTCTGTAT 
TAATT T GT AT 



AGTACTCTTA 
CGTAATTTTA 
AGCTATATTG 
AGTTGTGCTA 
GTTGGTTTTA 
GATAATTTTA 



TTTATACTTA 
TTTATTGTAC 
GCGCAGATGT 
CTACAGGTTA 
TTAGCATCGC 
TTAGGTTATT 



T TAT AT AT GT 
T CAT T TAT GT 
TTATTCTTCT 
TAATTTTATT 
TTCTTTTAGA 
TACTCTTAGA 



1200 
TACAAAGAAT 
GACAAAGAAT 
T CTAAAT ACA 
GTTGAATACA 
TTATATCCTT 
AATAAT CATT 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



1201 

AACCTGATAA 
AGCTTAATAA 
GTTGTCGTAA 
ATT GT AATAA 
TTTAAAACTA 
AATAAATTTA 



GGAAAATTTT 
AAAGAGTATT 
AGAAGAAAAC 
AAAGACAAAC 
ATTTGAAATT 
AC CTAAAAAT 



TATGATAGTT 

TATGAAATTA 

TATAAAATTT 

GATAAGATTT 

GACCAAGAAA 

TACTAAAAAA 
_ ★ * 



GCTCCGTACA 
GCACCCTATG 
TTATTGTACA 
TTCCTGTATT 
AACACTTTTA 
GCTGTCTTTT 



1250 
TACAACTGTT 
TACAATTTTT 
TACTTCCGTT 
TAGTTCCGAT 
TACTTGGTAT 
T GATAATT AT 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



1251 

CTTGTTAGCA 
TTTATTAGTA 
TCTACTAGTA 
ACTAATATTA 
GACTTTCTTA 
AGGGATAATA 



TTTACTTTTC 
TTTACCTTTT 
ATAGTAATGA 
CTATTAGTGA 
TTTAT CACCG 
TTATTATTGG 



TTTGCTCTAC 
TGAGTTCTAC 
TGTTATATTT 
TATTACGTTT 
CTTGTTTTTC 
TATGTTTTTC 



TATTTTTTTC 
AATTTTTTTT 
T GATAACTT A 
TGATAATTTG 
T T AT AAC AT A 
TTACAAAGTG 



1300 
AACTCAAATT 
AATTCAAATT 
CTAT CT AT AT 
GT GAGCAT AT 
TGGTCAATAA 
GAGTCTATTA 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



1301 

TTGTTCAAAA 
TTGTTCAAAA 
ATT AT C GT AT 
ATAATAGAAT 
TTGAAAAAAT 
T CAATT AT AT 



ATTAGATAGC 
ATTAGATGTT 
AATTAATTTG 
AATCAATTTG 
AATTAT GTAC 
AATACACTAT 



CTTTTGACAG 
CTTTTAACAG 
CGATCCGGGA 
CGGTCGGGAA 
AGAAAC C AAA 
AGATTTCAAA 



1350 

GTAG 

GTAG 

GTAGTGAATC CAGATTTTCT 
GTAGTGAATC TAGATTTTCT 
GT ACT AT CAC TAGGATGATA 
GTAGTAGTAC AAG AT T G AC A 



1351 1400 

Serotype IV . . GTTAAACT ATGCTCATTT ACAGCTTGTA GACGGCTTAA CTCTTTTTGG 

Serotype V . .ATTACACT ATGCTCATTT ACAACTTGTA GATGGTTTAA CTCCTTTTGG 

Serotype la GTATATAAAG ATACAGTAAA CAT C GTTATA AATAAT T CT T TATTATTTGG 

Serotype lb TTGTACAAGG AT AC C GTACA CT CAGTAAT T ACTGACTCAC TATTTCTGGG 

Serotype III GTTTATCAAG AAAGTATTAT TGAAGTTCTA AAAGGAAATA TTTTATTTGG 

Serotype VI GTCTATTACG AAAGTATAAG AGCGATTTTA GAT GGGAATT TCCTTATTGG 

Consensus * * — * * — *- *_+* 

1401 1450 

Serotype IV AAATAGTTTT AAGGAG A CGAGTGTCCT 

Serotype V AAATAGTTTT AAGGAA A CAAGTGTCCT 

Serotype la AGAAGGAGTT AAAGAGTTAT GGTTAAATAG TGATCTACCT TTGGGGTCGC 

Serotype lb AAAAGGTGTA AAAGAATT GT GGTTAAATAG T GAT TT AC CA CTAGGATCGC 

Serotype III ACAGGGTATA AGGA. . . TTC CATCAAGTGA AGGAATATTC CTAGGATCGC 

Serotype VI GCAAGGTATA AGAG...TTC CCTCCAGTGT GGGAATATTT TTAGGTTCAC 

Consensus — * — * — *- + * — ** — 



5 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



1451 

ATTTGATAAT 
ATTTGATAAT 
ATT CAACGTA 
ATTCGACCTA 
ATTCTACTTA 
ATTCATCATA 



AGCTACTCTA 
AGCTACTCTA 
TATAGGCTAT 
CATAGGTTAT 
TATTAGTGTC 
CATTAGTATA 



TGTTATTGAG 
T GTTATT GAG 
TTCTACAAAA 
TTCTATAAAA 
TTTTACAGGA 
TTTTATAGAA 



TATGTATGGT 
TATGTATGGT 
GTGGCCTGCT 
CTGGCCTATT 
CTTCTTTATT 
CTTCTTTTAC 



1500 
GTAGTACTTA 
GTAGTACTTA 
GGGATTAATG 
TGGACTAATA 
AGGAATTGTT 
GGGGCTGTTT 



1501 1550 

Serotype IV CCATGTTTTG TATGATAATC T ATT AT AT CT ATAGTAAAAA AGTCAATGTA 

Serotype V CCATGTTTTG TATGATAATC TATT AT AT CT ATAGTAAAAA GATAATCATA 

Serotype la AATATAGTTC CAGGTTTGCT TTTAAT . TTT TACTAATATT GGTAGGAAAG 

Serotype lb AATGTGATTT TAGGTTTGTT TCTAAT . TCT TATTAGCATT AT CAAGGAAG 

Serotype III CTTTATTTTT CTGCCTTTAT ACTTTTATAT AAAGAAGCGA TTTCAAAAAA 

Serotype VI CTTTTCTTTT CAATATTACT TTTTCTATAT AGAGAAGCTA TCAAACAAAA 

Consensus **- * — * — *-*-* 

1551 1600 

Serotype IV GTTGAGCTCC AGATACTTTT GTTTA TA 

Serotype V ATTGAACTTC AACTACTCCT ATTTA TA 

Serotype la CTAAACAATC AGCTTTTTAT TATGAGATAG TAGGAACACT TATAACTTTA 

Serotype lb CTAAAAAGT C AGATTTCTAT TATGAGATAG TAGGGTCTGT CATACTCCTA 

Serotype III TTATAAAAT C TACAGATTAT TTT T TTATACGTTA 

Serotype VI CAGGATAATC TACAAGCTTT TTT T TGGATTGTTA 

Consensus * * * — * ** 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



1601 

AT GT CT AT AG 
AT GT CT AT AA 
TTCTCATTTT 
TTTTCATTTT 
TTAT GTTACA 
TT AT T GT AT A 



TAT T ATTTAC 
TAT T ATTT AC 
TTGCACTTGA 
TTGCACTTGA 
CGCTCTTTGA 
TGGTATTTGA 



AGAGAGTTTT 
TGAAAGTTTT 
AGATCTTGAC 
AGATATTGAT 
G G AAAT AG AT 
AGAATTTGAT 



TACCCAAGTA 
TATCCCAGTG 
GGAGCTAATT 
GGCGCCAATT 
CCTAATCATT 
CCTAATCATT 



1650 
TAGTTATGAA 
TGGTAATGAA 
GGCTTATTGT 
GGCTCATTAT 
GGAGTATTGT 
GGAGTGTTGT 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



1651 

TATTAGTTGG 
TATTAGTTGG 
TTT TATT TTT 
TTTTGTCTTT 
AT T ATTATT C 
ATTGTTATTT 



ATGGTTTTTG 
CTAGTTTTTG 
ACAGT GT TAG 
ACAGTGTTGG 
TCAACTTTTG 

ACTACATTAG 




GGAAAATATT 
GTAAAATATT 
GAATTTTAGA 
GAATTTTAGA 
GTATAGT GGG 
GTATAGTAGG 



TTGTGGGGGT 
TTGTGATGGT 
AAATAAGGAT 
AAATAAGGAT 
AAGGGCTAA. 
GAGAG.GGA. 



1700 
GTAGATGATT 
AT CGAAC CT A 
TTTTATAGTC 
TTCTATAGTC 



1701 1750 

Serotype IV TACAAC GAGAGTT CACTTGGACG GCAAATAAAA ATTAGTGTAA 

Serotype V TAAAAA AGGAATT TACT, . .ATT GT GAATAAT A TATGACATAT 

Serotype la AACT TAAAAG GT GGAAAAGT TAATGGAAAA AC GAAT ACTT GTTTCTATCA 

Serotype lb AACTTAAAAG GT GGGAAAGT TAATGGAAAA ACAAATACTT GTTTCTATCG 

Serotype III AAAAT GAAAGAAAAA GTAACAGTCA 

Serotype VI ATGAT AAAAAAACTA GTTAGTGTGA 

Consensus * — * *-- 
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Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



1751 

TTGTACCAGT 
TTGCTCTGAT 
TTATACCTAT 
T TAT AC CT AT 
TTATACCTAT 
TTGTTCCAGT 



AT ATAATT C G 
AT GGCAGGAG 
AT ACAACT CA 
AT ACAACT C G 
AT ACAACT CA 
T T ATAATT C G 



AAACAATATT 
GTAAGGAAGG 
GAAGCATACC 
GAAGCAT AT C 
GAAGCATACC 
GAGTTAGTGA 



TAATAGCTTG 
AAAATGATAC 
TTAAAGAATG 
TTAAAGAATG 
TTAAAGAATG 
TTGAGAACTG 



1800 
CGTTGATTCA 
CTAAAGTTAT 
TGTGCAATCC 
CGTGCAATCC 
TGTGCAATCC 
TGTAGAATCT 



1801 

ATTAGAAAAC 
ACATTATTGT 
GTACTACAAC 
GTCCTACAAC 
GTACTACAAC 
TTGCTTCAAC 



AAACATATAA 
TGGTTTGGAG 
AGACT CAT C C 
AGACTCATTC 
AGACTCATCC 
AAAC AT AC C C 



GAATTTGGAA 
GAAATCCCTT 
ATT GAT AGAA 
AT T GAT AGAA 
ATT GAT AGAA 
AGAAATAGAA 



cpsl/M 

ATTATTCTTG 
ACCAGATAAT 
GTTATACTAA 
GTTATACT GA 
GTTATACTAA 
ATTTTATTAA 



1850 
TTAATGATGG 
TTAAAGAAAT 
TT GAT GAT G G 
TTAATGATGG 
T T GAT GAT GG 
TAGATGATGG 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



1851 

AT C AACAGAT 
ATATAAAAA. 
AT C C ACT GAT 
AT C C ACT GAT 
ATCCACTGAT 
AT CTACAGAT 
★ * * 



GGTAGTAAAG 
. . . CTTGGAG 
AATAGTGGAG 
AATAGT GGAG 
AATAGTGGAG 
AAAAGTAGTC 



AGTTATGTGA 
AGAACAATGT 
AAATTTGTGA 
AAATTTGTGA 
AAATTTGTGA 
ATATTTGTAA 



GGAGATAAGA 
CCGGATTATG 
TAATTTAT CT 
TAATT T AT CT 
TAATTTATCT 
TAATTTTTTA 



1900 
AAAT C AG AT G 
AAATTATT GA 
CAAGAAGATA 
CAAAAAGACG 
CAGGAAGATA 
AAAAGGGATA 



1901 1950 

Serotype IV AAAGAAT T AA GACATT T CAC AAAACAAATG GAGGACAATC AAGCGCAAGG 

Serotype V AT GGAAT GAG CATAATTATG AT GTTAGTAA AAATGTTTTT AT G AGAGAAG 

Serotype la AT CGCATACT TGTATTTCAT AAAAAAAATG GAGGGGTCTC TTCGGCAAGG 

Serotype lb AT C GCATACT TGTATTTCAT AAAAAAAATG GAGGGGTATC TTCGGCAAGG 

Serotype III AT CGCATACT TGTATTTCAT AAAAAAAATG GAGGGGTCTC TTCGGCAAGG 

Serotype VI GTCGCGTAAA AGT CT AT CAT AAATACAATG GAGGTGCATC AT CAGCAAGA 

Consensus * — * * * -* *- * — * — 

1951 2000 

Serotype IV AATTTAGGTA TTTTATACTC TACAGGAGAT TTGATTGGTT TTGTTGACAG 

Serotype V CATATACTAA GAAGAATTT TGCT TATGTTTCTG ACT AT GCAAG 

Serotype la AACCTAGGTC TAG AT AAAT C CACAGGAGAA T T C AT AAC AT TTGTGGATAG 

Serotype lb AACCTAGGTC TT GAT AAAT C CACAGGC GAA TT CAT AAC GT TTGTAGATAG 

Serotype III AACCTAGGTC TAGATAAATC CACAGGAGAA TT CATAACAT TTGTGGATAG 

Serotype VI AATGTGGGAC TTGAGATGGC AGAAGGT GAA TTTATAACTT TTGTAGATAG 

Consensus -* — * * — * * * — ** 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



2001 

C GAC GAT AC A 
AT T GGAT AT T 
TGATGATTTT 
TGATGATTTT 
TGATGATTTT 
C GAT GAT GT T 



ATTGACCCTA 
ATTTATACTT 
GTAGCACCGA 
GT AGCACC GA 
GTAGCACCGA 
GTCGCACTAA 



AAAT GT AT GA 
ATGGGGGGTT 
AT AT GATT GA 
AT ATAATT GA 
AT AT GATT GA 
AT AT GAT T GA 



AAC GTT ACTA 
CTATCTAGAT 
AATAAT GTTA 
AATAATGTTA 
AATAAT GT TA 
AATTATGCTG 



2050 
AAT AT AT AT G 
ACT GAT GT GG 
AAAAATTTAA 
AAAAATTTAA 
AAAAATTTAA 
AATAAT TT GT 



★ + 



7 



j 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



2051 

AAG AT GAACA 
AGCTTTTAAA 
TCACTGAGAA 
TCACTGAGGA 
TCACTGAGAA 
TAAC GGAGAA 



AGTAGACTGG 
AAGTTTAGAT 
TGCTGATATA 
TGCTGATATA 
TGCTGATATA 
CGCAGATATA 



GTGCAATGTA 
CCTTTGAGGA 
GCAGAAGTAG 
GCAGAAGTAG 
GCAGAAGTAG 
TCAGAAATTG 



ATCACAAAAA 
T T CAT GAGT G 
ATTTTGA. . . 
ATTTTGA. . . 
ATTTTGA. . . 
ATTTCGA. . . 



2100 
AATTTACTCT 
TTTTCTAGCA 
TATTTCGAAT 
TATTT C GAAT 
TATTTCGAAT 
AGTTT CAGAT 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



2101 

AACGGTGTTA 

AGGGAGATTA 

GAGAGAGATT 

GAGAGAGATT 

GAGAGAGATT 

GA. , . TTTTT 
★_ 



ACTTATATTA 
GTTGTGATGT 
ATAGAAAGAA 
ATAGAAAGAA 
ATAGAAAGAA 
ATAAAAGAAA 



TAATGGACCT 
GAATACAGGA 
G AAAAG AC G A 
AAAAAG AC G A 
GAAAAGAC GA 
AAAAAGAAAA 



GAATACTATA 
TTAATAATTG 
AACTTTTATA 
AACTTTTATA 
AACTTTTATA 
GGTTACTATA 



2150 
ATGTGCTTAA 
GCGCTGTTAA 
AAGT CTTT AA 
AGGTCTTTAA 
AAGTTTTTAA 
GAGTTTTTCA 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



2151 

TAAACAAGAT 
AGGACATCAC 
AAACAATAAC 
AAACAATAAT 
GAATAATAAC 
AAACAATAAG 



TTCCTATACG 
TTTTTAAAAT 
TCTTTAAAAG 
TCTTTAAAAG 
TCTTTGAAAG 
TCTCTCAAAG 



AATTTCTGAG 
CAAATATGTC 
AATTTTTATC 
AATTTTTATC 
AATTTTTATC 
AATTTTTTTC 



TACAAATAAG 
TAT AT AT GAC 
AGGCAATAGA 
AGGTAATAGA 
AGGTAATAGA 
AGGAAATAAA 



2200 
ATTTTTAGTT 
AAAAGT GAT T 
GT GGAAAAT A 
GTGGAAAATA 
GT GGAAAATA 
GTAGAAAATG 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



2201 

CAGTCTGCGA 
TAACTTCTCT 
TTGTTTGTAC 
TTGTTTGTAC 
TTGTTTGTAC 
TTGTTTGGGG 



GGGGTTGTTA 
TAATAAGACA 
AAAATTATAT 
AAAAT TAT AT 
AAAATTATAT 
GAAATTATAT 



TCTAGAGATT 
TGTGTAGAGG 
AAAAAAAGTA 
AAAAAAAGTA 
AAAAAAAGTA 
AAAAAAAGCA 



TAGCTTTAAA 
TTACAACTAA 
TAATTGGCAA 
TAATT GGTAA 
TAATTGGTAA 
TTATTGGGGA 



2250 
AATAAAATTC 
TTTATTGATA 
CTTGAGGTTT 
CTTGAGGTTT 
CTTGAGGTTT 
TTTACGATTT 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



2251 

C GT GAAGAAA 
AACAGAGGGC 
GATGAGAACT 
GATGAGAATT 
GATGAGAACT 
AATGAAAAAT 



AAAAAT. . .A 
TTAAGA. . .A 
TAAAAATTGG 
TAAAAATTGG 
TAAAAATTGG 
ACAAAATTGG 



T GAAGAT AC A 
TAAGAATATT 
T GAG GAT T T A 
TGAGGATTTA 
TGAGGATTTA 
TGAAGACTTG 



CAGTTTTATT 
ATTCAAAAGA 
CTTTTTAATT 
CTTTTTAATT 
CTTTTTAATT 
CTATTTAACT 



2300 

TT GAT CT CAT 

TTGA. . TGAT 

GTAAACTCTT 

GTAAAATTTT 

GCAAACTCTT 

TTCAGATTTT 
* * * 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



2301 

AAAAAATGCT 
ATAACAATAT 
AT GT CAAGAG 
AT GT C AAGAG 
AT GT CAAGAG 
AAATAAAGAA 



AATAAGTTTG 
AT C CGAGAAA 
CAC CGT AT AG 
CACT G CAT AG 
CACCGTATAG 
CAT C GT AT AG 



TTATTATAAG 
TTATTT TAAT 
T C GTAGAT AC 
T C GTAGAT AC 
TCGTAGATAC 
T T GTAGAT AC 



CCAACCTTTT 
CCAAAGAATT 
GACTTCTTCC 
GACTTCTTCC 
GACTTCTTCC 
T AG AAGAT C A 



2350 
TATAATTACT 
TATTAACA. . 
TTATATACTT 
TTGTACACCT 
TTATATACTT 
CTCTATACTT 



8 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



2351 

ACTACAGAAA 
. . GGTAAGGT 
AT CGAATT GT 
ATCGCATCGT 
AT CGAATT GT 
ATCGTATTGA 



AAATAGTACA 
TGATTGTCTG 
AAAAACTTCC 
AAAGACTTCT 
AAAAACTTCT 
AGAAAAATCT 



ACAACTTCCT 
ACTAGT GTTA 
GCAA.TGAAT 
GCAA.TGAAT 
GTAA.TGAAT 
ATAA.TGAAT 



CATATAGTAG 
CCTATTCTAT 
CAGAAATTCA 
CAGGAGTTCA 
CAGAAATTCA 
CAACAATTTA 



2400 
CTATCAATGG 
ACAT CATTAC 
AC GAAAACT C 
AC GAAAATT C 
AC GAAAACT C 
ATAAAAATAC 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



2401 

GACATAATCG 
GAAGGAAGTT 
ATTAGATTTT 
ATTAGATTTT 
ATTAGATTTT 
ATTAGACTTC 



ATATCTGTAC 
G GAAAAGT T C 
AT AACAAT T T 
ATAACAATTT 
ATAACAATTT 
AT T GAT ATT T 



TGAGTGTTAT 
TTCATTTATT 
TTAAT GAAGT 
TTAAT GAAAT 
TTAAT GAAAT 
TTAAT GAGAT 



TATTAT GCAA 
TCAGATTCTC 
AAGTAGTTTG 
AAGCAGTATT 
AAGTAGTTTG 
T CAT CAGGAT 



2450 
AG GATT TTAA 
TAAAGATTAG 
GTTCCTGCCA 
GTTCCTGCAA 
GTTCCTGCCA 
AGTCCGACAG 



2451 2500 

Serotype IV TGGATTTGAA GAAGT TGCTT TTT CAAGATT ATTTGGTGCA TATTCGTTAG 

Serotype V AGTAAGGCTC ATAATTGATT TTTTATTTGG AT AT GGTACT TATAGAATGC 

Serotype la AATTGGCTAA TTATGTTGAA GC GAAAT TTT TAAGAGAAAA GATAAAGT GT 

Serotype lb AATTAGCTAA TTATGTTGAA GC GAAAT TTT TAAGAGAAAA GGTAAAGTGT 

Serotype III GATTAGCTAA TTATGTTGAA GCGAAATTTT TAAGAGAAAA GATAAAGT GT 

Serotype VI AATTGTTTAA TTATGTGGAA GC GAAGT TTG T AC GAG AAAA AATCAAGTGT 

Consensus — * — * * * — * * — 

2501 2550 

Serotype IV TAGCTAATAA AATTGTATAT AAT AAAG AT T ATAGAAAAAC CGAAGAATTA 

Serotype V TTCTAAGGTT T CTAAAGTTA AAGAAATAG 

Serotype la CTCCGAAAAA TGTTTGAATT AGGTAGTAAT ATTGACAATA AAAT CAAAGT 

Serotype lb CTCCGAAAAA TGTTTGAATT AGGTAGTAAT ATTGACAGTA AAAT CAAATT 

Serotype III CTCCGAAAAA TGTTTGAATT AGGTAGTAAT ATTGACAATA AAAT CAAAGT 

Serotype VI TTAAGGAAAA TGTTTGAATT AGGAGAAATA GCTGATGAAA ATTTACGTTT 

Consensus * *— 

2551 2600 

Serotype IV AGATAA 

Serotype V 

Serotype la ACAACGAGAG ATTTTTTTCA AAGACATTAA AT CATACCCG TTCTATAAAG 

Serotype lb ACAACGAGAG ATTTTTTTCA AAGATGTTAA AT TATACCCT TTCTATAAAG 

Serotype III ACAACGAGAG ATTTTTTTCA AAGACATTAA AT CATACCCG TTCTATAAAG 

Serotype VI ACAGAGATAT AAATTTTGGC AAGATATTAA AT CAT ATT CA AT AT GCAAAG 

Consensus 

2601 2650 

Serotype IV 

Serotype V 

Serotype la CGGTAAAATA CTT AT CATTA AAGG GATT AT TAAGCTTTTA TTTAATGAAA 

Serotype lb CGGTTAAGTA CT TAT CATTA AAGGGATTAT TGAGTATTTA CTTAATGAAA 

Serotype III CGGTCAAATA CTTAT CATTA AAGGGATTAT TAAGCTTTTA TTTAATGAAA 

Serotype VI CAATAAGGTT CTTAT CTAAA AAACATAT CT GTACGTTATA TTT GAT GAAA 

Consensus 
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Figure 4. Two sites (*) of sequence heterogeneity between alp2 (AF208158, upper lines) and 



alp3 (AF291065, lower lines) used to distinguish them (relevant primers are shown), 

251 AAGGTAAT CT TAATATTTTT GAAGAGT CAAT AGTTGCTGCAT C TACAAT T 300 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
531 A AGGTAAT CTTAAT AT T TT T GAAGAGT CAAT AGTT GCT GCAT C TACAATT 58 0 

bcaSl . 

301 C C AG GGAGT GC AGC GACCT TAAAT ACAAGCAT C ACTAAAAATAT ACAAAA 350 

I I I I i I I I I i I I I I i I I 1 I I I I I I I i I I I I I I I I I I I I I I I 1 I I I I I I I I 
581 C C AG GGAGT GCAGC GAC CT TAAAT ACAAGCAT C ACTAAAAATAT ACAAAA 630 

bcaS2 

351 C GGAAAC GCT T ACATAGATT T ATAT GAT GT AAAGAAT GGATT GAT TGAT C 400 

I I I I I I * I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I * I I I I 
631 CGGAAATGCTTACATAGATTTATAT GATGTAAAGAATGGATTGATCGAT C 680 



401 CT CAAAACCT CATT GT ATTAAAT CCAT CAAGCTATT CAGCAAATT ATTAT 450 

I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
681 CT CAAAACCT CATT GT ATTAAAT CCAT CAAGCTATT CA GCAAATTATTAT 730 

balS 

451 ATCAAACAAGGTGCTAAATATTATAGTAATCCGAGTGAAATTACAACAAC 500 
I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

731 AT CAAACAAGGT GCTAAAT ATT AT AGTAAT CC GAGT GAAATT ACAACAAC 780 
• • • • • 

501 T GGT T CAGCAACTATT ACTTTTAATATACTT GAT GAAACT GGAAAT CCAC 550 
I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I 

781 T GGT T CAGCAACTATT ACTTTTAATATACTT GAT GAAACT GGAAAT CCAC 830 

551 ATAAAAAAGCT GAT GGACAAAT T GAT AT AGTT AGT GT GAATTTAACT AT A 600 

I I I I I I II I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I 
831 ATAAAAAAGCT GAT GGACAAAT T GAT AT AGTT AGT GT GAAT TTAACT AT A 880 

601 TAT GAT T C T AC AG C T T T AAGAAAT AG GAT AG AT GAAGT AAT AAAT AAT G C 650 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
8 81 TATGATTCTACAGCTTTAAGAAATAGGATAGATGAAGTAATAAATAATGC 930 

651 AAAT GAT C CTAAGT GGAGT GAT GGGAGT CGT GAT GAAGT CTTAACT GGAT 7 00 

I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
931 AAAT GAT C CTAAGT GG AGT GAT GGGAGT CGT GAT GAAGT CTTAACT GGA T 98 0 

balA 



